This is an old revision of the document!

SYN2025 Token Attributes

The SYN2025 corpus uses the Unified CNC Annotation Scheme for morphological annotation and lemmatization and the dependency syntactic annotation as in the analytic layer of the Prague Dependency Treebank for syntactic annotation.

The following token attributes are used:

Basic token attributes

word: word form
lc: lowercase word form
sword: word form with marked boundaries of syntactic words (in cases where the token is a multi‑word token, such as ‘ses’
lemma: the basic dictionary form of the word
sublemma: the base form distinguishing stylistic or other variant forms
tag: morphological tag
pos: part of speech
case: grammatical case
verbtag: tag for verbal forms (including compound verb forms)

Syntactic attributes

ord: word order position in the sentence
afun: syntactic function of the word
parent: reference to the governing (head) word (relative index)

Attributes derived from the governing (parent) token

p_ord: order position of the parent token in the sentence
p_lemma: lemma of the parent token
p_sublemma: sublemma of the parent token
p_tag: tag of the parent token
p_pos: POS of the parent token
p_case: case of the parent token
p_verbtag: verbtag of the parent token
p_afun: syntactic function of the parent token

Attributes derived from the nearest governing content word (effective parent)

ep_ord: order position of the effective parent in the sentence
ep_lemma: lemma of the effective parent
ep_sublemma: sublemma of the effective parent
ep_tag: tag of the effective parent
ep_pos: part of speech of the effective parent
ep_case: case of the effective parent
ep_verbtag: verbtag of the effective parent
ep_afun: syntactic function of the effective parent

Additional syntactic attribute

prep: lemma of the preposition governing the given word (if any)

— Tomáš Jelínek

Trace: • syn2025_attributes

Log In

Print/export

Printable version

Tools

Languages

cs
en

Licence