Differences

This shows you the differences between two versions of the page.

--- en:seznamy:syn2025_attributes [2026/02/16 12:53] – created tomasjelinek
+++ en:seznamy:syn2025_attributes [2026/02/18 17:38] (current) – michalkren
@@ Line 1: / Line 1: @@
 ====== SYN2025 Token Attributes ======
-The [[en:cnk:syn2025|SYN2025]] corpus uses the [[en:cnk:anotacni_standard_cnk|Unified CNC Annotation Scheme]] for morphological annotation and lemmatization and the dependency syntactic annotation as in [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/cz/a-layer/html/index.html|the analytic layer of the Prague Dependency Treebank]] for syntactic annotation.
+The [[en:cnk:syn2025|SYN2025]] corpus uses the [[en:cnk:anotacni_standard_cnk|Unified CNC Annotation Scheme]] for morphological annotation and lemmatization and the dependency syntactic annotation as in [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/cz/a-layer/html/index.html|the analytic layer of the Prague Dependency Treebank]] for [[en:pojmy:syntakticka_analyza|syntactic annotation]].
-The following token attributes are used:
+The following token (positional) attributes are used, in the given order:
+Basic token attributes
   * [[en:pojmy:word|word]]: word form
-  * lc: lowercase word form
   * sword: word form with marked boundaries of syntactic words (in cases where the token is a [[en:cnk:anotacni_standard_cnk?s[]=multiword#tokenization_lemmatization_and_tagging_of_multiword_tokens|multi‑word token]], such as ‘ses’
+  * [[en:pojmy:lemma|lemma]]: the basic dictionary form of the word
+  * [[en:pojmy:lemma?s[]=sublemma#sublemma|sublemma]]: the base form distinguishing stylistic or other variant forms
+  * [[en:pojmy:tag|tag]]: morphological tag
+  * pos: part of speech
+  * case: grammatical case
+  * [[en:cnk:anotacni_standard_cnk#tagging_of_verb_formsthe_verbtag_attribute|verbtag]]: tag for verbal forms (including compound verb forms)
+-----
+Syntactic attributes
+  * ord: word order position in the sentence
+  * afun: syntactic function of the word
+  * parent: (relative) reference to the governing token (parent)
+-----
+Attributes derived from the governing token (parent)
+  * p_ord: order position of the parent token in the sentence
+  * p_lemma: lemma of the parent token
+  * p_sublemma: sublemma of the parent token
+  * p_tag: tag of the parent token
+  * p_pos: POS of the parent token
+  * p_case: case of the parent token
+  * p_verbtag: verbtag of the parent token
+  * p_afun: syntactic function of the parent token
+-----
+Syntactic attributes
+  * eparent: (relative) reference to the governing content word (effective parent)
+------
+Attributes derived from the governing content word (effective parent)
+  * ep_ord: order position of the effective parent in the sentence
+  * ep_lemma: lemma of the effective parent
+  * ep_sublemma: sublemma of the effective parent
+  * ep_tag: tag of the effective parent
+  * ep_pos: part of speech of the effective parent
+  * ep_case: case of the effective parent
+  * ep_verbtag: verbtag of the effective parent
+  * ep_afun: syntactic function of the effective parent
+------
+Additional syntactic attribute
+  * prep: lemma of the preposition governing the given word (if any)
-  * [[seznamy:afun|afun, p_afun, ep_afun]]: u každého tokenu uvádí syntaktickou funkci podle [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/cz/a-layer/html/index.html|analytické roviny PDT]]
-  * [[seznamy:parent|parent]]: relativní pozice tokenu, na němž je daný token závislý
-  * [[seznamy:eparent|eparent]]: pouze u autosémantických slov; vyjadřuje relativní pozici nejbližšího autosémantického tokenu, na němž je daný token závislý (přeskakuje předložky, spojky ap.)
-  * [[seznamy:p_tag|p_tag, p_lemma]]: tag a lemma řídícího tokenu
-  * [[seznamy:p_tag|ep_tag, ep_lemma]]: totéž jako ''p_tag'' a ''p_lemma'', ale pouze u autosémantických slov
-  * [[seznamy:prep|prep]]: u jmen řízených předložkou uvádí lemma předložky
-  * v korpusu SYN2025 také [[seznamy:ord|ord]] a p_ord: pořadí slova ve větě a pořadí řídícího slova ve větě
-  * v korpusu SYN2025 je také více atributů odvozených od atributu parent a eparent: p_pos, p_case, p_afun, p_verbtag; ep_pos, ep_case, ep_afun, ep_verbtag.
  --- //Tomáš Jelínek//

Trace:

Differences

Search

Navigation

Print/export

Tools

Languages

Licence