The SYN2025 corpus uses the Unified CNC Annotation Scheme for morphological annotation and lemmatization and the dependency syntactic annotation as in the analytic layer of the Prague Dependency Treebank for syntactic annotation.
The following token (positional) attributes are used, in the given order:
Basic token attributes
-
sword: word form with marked boundaries of syntactic words (in cases where the token is a
multi‑word token, such as ‘ses’
lemma: the basic dictionary form of the word
sublemma: the base form distinguishing stylistic or other variant forms
-
pos: part of speech
case: grammatical case
verbtag: tag for verbal forms (including compound verb forms)
Syntactic attributes
ord: word order position in the sentence
afun: syntactic function of the word
parent: (relative) reference to the governing token (parent)
Attributes derived from the governing token (parent)
p_ord: order position of the parent token in the sentence
p_lemma: lemma of the parent token
p_sublemma: sublemma of the parent token
p_tag: tag of the parent token
p_pos: POS of the parent token
p_case: case of the parent token
p_verbtag: verbtag of the parent token
p_afun: syntactic function of the parent token
Syntactic attributes
Attributes derived from the governing content word (effective parent)
ep_ord: order position of the effective parent in the sentence
ep_lemma: lemma of the effective parent
ep_sublemma: sublemma of the effective parent
ep_tag: tag of the effective parent
ep_pos: part of speech of the effective parent
ep_case: case of the effective parent
ep_verbtag: verbtag of the effective parent
ep_afun: syntactic function of the effective parent
Additional syntactic attribute
— Tomáš Jelínek