AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:syn:verze13 [2024/12/27 17:28] – [How to cite SYN version 13] michalkrenen:cnk:syn:verze13 [2026/01/23 11:49] (current) – [Structure and annotation of SYN version 13] krivan
Line 4: Line 4:
 <WRAP right 35%> <WRAP right 35%>
 ^ <fs medium>Name</fs> ^^ <fs medium>SYN version 13</fs> ^ ^ <fs medium>Name</fs> ^^ <fs medium>SYN version 13</fs> ^
-^ [[pojmy:atributy_pozicni|Position]] ^ Number of tokens |  6 238 142 297 |   +^ [[pojmy:atributy_pozicni|Position]] ^ Number of tokens |  6 400 899 055 |   
-^ ::: ^ Number of tokens without punctuation  |  5 174 701 189 |   +^ ::: ^ Number of tokens without punctuation  |  5 310 635 949 |   
-^ ::: ^ Number of [[en:pojmy:word|word forms]]  |  11 384 712 |   +^ ::: ^ Number of [[en:pojmy:word|word forms]]  |  11 522 926 |   
-^ ::: ^ Number of [[en:pojmy:lemma|lemmas]] |  7 604 956 +^ ::: ^ Number of [[en:pojmy:lemma|lemmas]] |  7 655 932 
-^ Structures ^ Number of documents |  144 755 +^ Structures ^ Number of documents |  151 076 
-^ ::: ^ Number of texts |  18 965 216 +^ ::: ^ Number of texts |  19 363 730 
-^ ::: ^ Number of sentences |  398 423 123 |+^ ::: ^ Number of sentences |  408 749 819 |
 ^ Other information ^ Referential |  YES |   ^ Other information ^ Referential |  YES |  
 ^ ::: ^ Representative |  NO (predominantly journalism) |   ^ ::: ^ Representative |  NO (predominantly journalism) |  
-^ ::: ^ Publication year |  2023 |+^ ::: ^ Publication year |  2024 |
 </WRAP> </WRAP>
  
Line 45: Line 45:
 ====== Structure and annotation of SYN version 13 ====== ====== Structure and annotation of SYN version 13 ======
  
-Generally speaking, structure and annotation of SYN version 13 are based on that of the SYN2020 corpus. In particular, hierarchy of structural tags for SYN version 13 has been taken over from SYN2020, as well as the [[en:cnk:syn2020#annotation_of_syn2020changes_compared_to_other_corpora_of_the_syn_series|lemmatization and morphological tagging]]. In this respect, SYN version 13 is the same as its predecessor, [[en:cnk:syn:verze12|SYN version 12]].+Generally speaking, structure and annotation of SYN version 13 are based on that of the SYN2020 corpus. Hierarchy of structural tags for SYN version 13 has been taken over from SYN2020. Morphological tagginglemmatization, and tokenization of the corpus are performed fully automatically according to the [[en:cnk:anotacni_standard_cnk|unified CNC annotation scheme]]. In this respect, SYN version 13 is the same as its predecessor, [[en:cnk:syn:verze12|SYN version 12]].
  
 The correspondence of structure and annotation between SYN version 13 and [[en:cnk:syn2020|SYN2020]] only has the following exceptions: The correspondence of structure and annotation between SYN version 13 and [[en:cnk:syn2020|SYN2020]] only has the following exceptions: