This is an old revision of the document!
SYN2025 Token Attributes
The SYN2025 corpus uses the Unified CNC Annotation Scheme for morphological annotation and lemmatization and the dependency syntactic annotation as in the analytic layer of the Prague Dependency Treebank for syntactic annotation.
The following token attributes are used:
Basic token attributes
- word: word form
- lc: lowercase word form
- sword: word form with marked boundaries of syntactic words (in cases where the token is a multi‑word token, such as ‘ses’
- lemma: the basic dictionary form of the word
- sublemma: the base form distinguishing stylistic or other variant forms
- tag: morphological tag
- pos: part of speech
- case: grammatical case
- verbtag: tag for verbal forms (including compound verb forms)
Syntactic attributes
- ord: word order position in the sentence
- afun: syntactic function of the word
- parent: reference to the governing (head) word (relative index)
Attributes derived from the governing (parent) token
- p_ord: order position of the parent token in the sentence
- p_lemma: lemma of the parent token
- p_sublemma: sublemma of the parent token
- p_tag: tag of the parent token
- p_pos: POS of the parent token
- p_case: case of the parent token
- p_verbtag: verbtag of the parent token
- p_afun: syntactic function of the parent token
Attributes derived from the nearest governing content word (effective parent)
- ep_ord: order position of the effective parent in the sentence
- ep_lemma: lemma of the effective parent
- ep_sublemma: sublemma of the effective parent
- ep_tag: tag of the effective parent
- ep_pos: part of speech of the effective parent
- ep_case: case of the effective parent
- ep_verbtag: verbtag of the effective parent
- ep_afun: syntactic function of the effective parent
Additional syntactic attribute
- prep: lemma of the preposition governing the given word (if any)
— Tomáš Jelínek