Skrýt
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
en:cnk:syn2010 [2016/12/11 16:26]
Veronika Pojarová [Corpus SYN2010]
en:cnk:syn2010 [2016/12/11 16:27] (current)
Veronika Pojarová
Line 3: Line 3:
  
 SYN2010 is a synchronic representative corpus of written Czech comprising 100 million tokens. It is a sequel to the corpora [[en:​cnk:​SYN2000]] and [[en:​cnk:​SYN2005]] and together with them forms a series of synchronic representative corpora that cover three successive periods. ​ SYN2010 is a synchronic representative corpus of written Czech comprising 100 million tokens. It is a sequel to the corpora [[en:​cnk:​SYN2000]] and [[en:​cnk:​SYN2005]] and together with them forms a series of synchronic representative corpora that cover three successive periods. ​
-**All corpora contain different texts and are therefore disjunctive**. The basic characteristic ​freatures ​of the SYN2010 are identical to those of the corpus [[en:​SYN2005|SYN2005]],​ which is predominantly related to the same conception of [[en:​pojmy:​reprezentativnost|representativeness]] based on the reception of written language and the resulting composition of the corpus. The SYN2010 corpus is [[en:​pojmy:​lemma|lemmatized]] and [[en:​pojmy:​tag|morphologically tagged]].+**All corpora contain different texts and are therefore disjunctive**. The basic characteristic ​features ​of the SYN2010 are identical to those of the corpus [[en:cnk:​SYN2005|SYN2005]],​ which is predominantly related to the same conception of [[en:​pojmy:​reprezentativnost|representativeness]] based on the reception of written language and the resulting composition of the corpus. The SYN2010 corpus is [[en:​pojmy:​lemma|lemmatized]] and [[en:​pojmy:​tag|morphologically tagged]].
  
  
Line 22: Line 22:
 ====== Changes compared to the SYN2005 corpus ====== ====== Changes compared to the SYN2005 corpus ======
  
-Compared to the corpus [[en:​SYN2005|SYN2005]],​ the SYN2010 corpus saw **significant improvements in lemmatization** and **[[en:​pojmy:​tag|morphological tagging]]**;​ both basically identical to the processing of the [[en:​SYN2009PUB|SYN2009PUB]] corpus. Therefore, although [[en:​SYN2005|SYN2005]] and SYN2010 do not differ in their understanding of [[en:​pojmy:​reprezentativnost|representativeness]],​ **these differences should be taken into account** when comparing their lexical frequencies. ​+Compared to the corpus [[en:cnk:​SYN2005|SYN2005]],​ the SYN2010 corpus saw **significant improvements in lemmatization** and **[[en:​pojmy:​tag|morphological tagging]]**;​ both basically identical to the processing of the [[en:cnk:​SYN2009PUB|SYN2009PUB]] corpus. Therefore, although [[en:cnk:​SYN2005|SYN2005]] and SYN2010 do not differ in their understanding of [[en:​pojmy:​reprezentativnost|representativeness]],​ **these differences should be taken into account** when comparing their lexical frequencies. ​
  
 ====== Composition of SYN2010 ====== ====== Composition of SYN2010 ======