This is an old revision of the document!

SYN2025 Token Attributes

The SYN2025 corpus uses the Unified CNC Annotation Scheme for morphological annotation and lemmatization and the dependency syntactic annotation as in the analytic layer of the Prague Dependency Treebank for syntactic annotation.

The following token attributes are used:

word: word form
lc: lowercase word form
sword: word form with marked boundaries of syntactic words (in cases where the token is a multi‑word token, such as ‘ses’

afun, p_afun, ep_afun: u každého tokenu uvádí syntaktickou funkci podle analytické roviny PDT
parent: relativní pozice tokenu, na němž je daný token závislý
eparent: pouze u autosémantických slov; vyjadřuje relativní pozici nejbližšího autosémantického tokenu, na němž je daný token závislý (přeskakuje předložky, spojky ap.)
p_tag, p_lemma: tag a lemma řídícího tokenu
ep_tag, ep_lemma: totéž jako p_tag a p_lemma, ale pouze u autosémantických slov
prep: u jmen řízených předložkou uvádí lemma předložky
v korpusu SYN2025 také ord a p_ord: pořadí slova ve větě a pořadí řídícího slova ve větě
v korpusu SYN2025 je také více atributů odvozených od atributu parent a eparent: p_pos, p_case, p_afun, p_verbtag; ep_pos, ep_case, ep_afun, ep_verbtag.

— Tomáš Jelínek

Trace: • syn2025_attributes

Log In

Print/export

Printable version

Tools

Languages

cs
en

Licence