Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:cnk:syn2015 [2020/09/01 17:33]
Michal Křen [Text classification]
en:cnk:syn2015 [2020/09/01 17:46] (current)
Michal Křen [Positional annotation and tagging]
Line 33: Line 33:
 ==== Text classification ==== ==== Text classification ====
  
-The original **text classification** scheme of the SYN series has been updated and revised; both original and revised classifications are based on text-external criteria that reflect predominant function of a text. The revision has been made with respect to comparability with the original scheme, with the most significant change made to the sub-classification of non-fiction adopted from the [[http://www.en.nkp.cz|Czech National Library]] and more detailed classification of newspaper texts.+The original **text classification** scheme of the SYN series has been [[https://wiki.korpus.cz/doku.php/en:cnk:klasifikace_textu_syn2015|updated and revised]]; both original and revised classifications are based on text-external criteria that reflect predominant function of a text. The revision has been made with respect to comparability with the original scheme, with the most significant change made to the sub-classification of non-fiction adopted from the [[http://www.en.nkp.cz|Czech National Library]] and more detailed classification of newspaper texts.
  
 ^ Txtype_group ^ Portion ^ ^ Txtype_group ^ Portion ^
Line 73: Line 73:
 | LEI | | leisure magazines |  13,33 % | | LEI | | leisure magazines |  13,33 % |
  
-A detailed information about the text classification scheme is available on a [[https://wiki.korpus.cz/doku.php/en:cnk:klasifikace_textu_syn2015|here]].+A detailed information about the text classification scheme is available [[https://wiki.korpus.cz/doku.php/en:cnk:klasifikace_textu_syn2015|here]].
  
 ==== Concept of synchronicity ==== ==== Concept of synchronicity ====
Line 91: Line 91:
 Compared to previous versions there have been improvements in [[en:pojmy:lemma|lemmatization]] and [[en:pojmy:morfologicka_analyza|morphological tagging]]; both are almost identical to the processes used for the corpus [[en:cnk:syn2013pub|SYN2013PUB]], nonetheless SYN2015 was processed using the newest versions of all the tools (the improvements relate both to the morphological dictionary and to the rule-based [[en:pojmy:desambiguace|disambiguation]]). Furthermore, the lemmatization of punctuation marks has changed, preserving the form of the characters as much as possible.  Compared to previous versions there have been improvements in [[en:pojmy:lemma|lemmatization]] and [[en:pojmy:morfologicka_analyza|morphological tagging]]; both are almost identical to the processes used for the corpus [[en:cnk:syn2013pub|SYN2013PUB]], nonetheless SYN2015 was processed using the newest versions of all the tools (the improvements relate both to the morphological dictionary and to the rule-based [[en:pojmy:desambiguace|disambiguation]]). Furthermore, the lemmatization of punctuation marks has changed, preserving the form of the characters as much as possible. 
  
 +Last but not least, SYN2015 is the first CNC corpus featuring a **[[https://wiki.korpus.cz/doku.php/en:pojmy:syntakticka_analyza|syntactic annotation]]**.
  
 ====== How to cite SYN2015 ====== ====== How to cite SYN2015 ======