SYN2025 Token Attributes

The SYN2025 corpus uses the Unified CNC Annotation Scheme for morphological annotation and lemmatization and the dependency syntactic annotation as in the analytic layer of the Prague Dependency Treebank for syntactic annotation.

The following token (positional) attributes are used, in the given order:

Basic token attributes

word: word form
sword: word form with marked boundaries of syntactic words (in cases where the token is a multi‑word token, such as ‘ses’
lemma: the basic dictionary form of the word
sublemma: the base form distinguishing stylistic or other variant forms
tag: morphological tag
pos: part of speech
case: grammatical case
verbtag: tag for verbal forms (including compound verb forms)

Syntactic attributes

Attributes derived from the governing token (parent)

Syntactic attributes

Attributes derived from the governing content word (effective parent)

Additional syntactic attribute

— Tomáš Jelínek