AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:seznamy:syn2025_attributes [2026/02/16 13:23] tomasjelineken:seznamy:syn2025_attributes [2026/02/18 17:38] (current) michalkren
Line 1: Line 1:
 ====== SYN2025 Token Attributes ====== ====== SYN2025 Token Attributes ======
  
-The [[en:cnk:syn2025|SYN2025]] corpus uses the [[en:cnk:anotacni_standard_cnk|Unified CNC Annotation Scheme]] for morphological annotation and lemmatization and the dependency syntactic annotation as in [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/cz/a-layer/html/index.html|the analytic layer of the Prague Dependency Treebank]] for syntactic annotation.+The [[en:cnk:syn2025|SYN2025]] corpus uses the [[en:cnk:anotacni_standard_cnk|Unified CNC Annotation Scheme]] for morphological annotation and lemmatization and the dependency syntactic annotation as in [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/cz/a-layer/html/index.html|the analytic layer of the Prague Dependency Treebank]] for [[en:pojmy:syntakticka_analyza|syntactic annotation]].
  
-The following token attributes are used:+The following token (positional) attributes are used, in the given order:
  
 Basic token attributes Basic token attributes
   * [[en:pojmy:word|word]]: word form   * [[en:pojmy:word|word]]: word form
-  * lc: lowercase word form 
   * sword: word form with marked boundaries of syntactic words (in cases where the token is a [[en:cnk:anotacni_standard_cnk?s[]=multiword#tokenization_lemmatization_and_tagging_of_multiword_tokens|multi‑word token]], such as ‘ses’   * sword: word form with marked boundaries of syntactic words (in cases where the token is a [[en:cnk:anotacni_standard_cnk?s[]=multiword#tokenization_lemmatization_and_tagging_of_multiword_tokens|multi‑word token]], such as ‘ses’
-  * lemma: the basic dictionary form of the word +  * [[en:pojmy:lemma|lemma]]: the basic dictionary form of the word 
-  * sublemma: the base form distinguishing stylistic or other variant forms +  * [[en:pojmy:lemma?s[]=sublemma#sublemma|sublemma]]: the base form distinguishing stylistic or other variant forms 
-  * tag: morphological tag+  * [[en:pojmy:tag|tag]]: morphological tag
   * pos: part of speech   * pos: part of speech
   * case: grammatical case   * case: grammatical case
-  * verbtag: tag for verbal forms (including compound verb forms)+  * [[en:cnk:anotacni_standard_cnk#tagging_of_verb_formsthe_verbtag_attribute|verbtag]]: tag for verbal forms (including compound verb forms)
  
 ----- -----
Line 21: Line 20:
   * ord: word order position in the sentence   * ord: word order position in the sentence
   * afun: syntactic function of the word   * afun: syntactic function of the word
-  * parent: reference to the governing (head) word (relative index)+  * parent: (relative) reference to the governing token (parent)
  
 ----- -----
  
-Attributes derived from the governing (parent) token+Attributes derived from the governing token (parent)
   * p_ord: order position of the parent token in the sentence   * p_ord: order position of the parent token in the sentence
   * p_lemma: lemma of the parent token   * p_lemma: lemma of the parent token
Line 34: Line 33:
   * p_verbtag: verbtag of the parent token   * p_verbtag: verbtag of the parent token
   * p_afun: syntactic function of the parent token   * p_afun: syntactic function of the parent token
 +
 +-----
 +
 +Syntactic attributes
 +  * eparent: (relative) reference to the governing content word (effective parent)
  
 ------ ------
  
-Attributes derived from the nearest governing content word (effective parent)+Attributes derived from the governing content word (effective parent)
  
   * ep_ord: order position of the effective parent in the sentence   * ep_ord: order position of the effective parent in the sentence
Line 53: Line 57:
  
   * prep: lemma of the preposition governing the given word (if any)   * prep: lemma of the preposition governing the given word (if any)
 +
  --- //Tomáš Jelínek//  --- //Tomáš Jelínek//