AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
en:cnk:syn2010 [2016/12/11 16:26] – [Corpus SYN2010] Veronika Pojarováen:cnk:syn2010 [2016/12/11 16:27] (current) Veronika Pojarová
Line 3: Line 3:
  
 SYN2010 is a synchronic representative corpus of written Czech comprising 100 million tokens. It is a sequel to the corpora [[en:cnk:SYN2000]] and [[en:cnk:SYN2005]] and together with them forms a series of synchronic representative corpora that cover three successive periods.  SYN2010 is a synchronic representative corpus of written Czech comprising 100 million tokens. It is a sequel to the corpora [[en:cnk:SYN2000]] and [[en:cnk:SYN2005]] and together with them forms a series of synchronic representative corpora that cover three successive periods. 
-**All corpora contain different texts and are therefore disjunctive**. The basic characteristic freatures of the SYN2010 are identical to those of the corpus [[en:SYN2005|SYN2005]], which is predominantly related to the same conception of [[en:pojmy:reprezentativnost|representativeness]] based on the reception of written language and the resulting composition of the corpus. The SYN2010 corpus is [[en:pojmy:lemma|lemmatized]] and [[en:pojmy:tag|morphologically tagged]].+**All corpora contain different texts and are therefore disjunctive**. The basic characteristic features of the SYN2010 are identical to those of the corpus [[en:cnk:SYN2005|SYN2005]], which is predominantly related to the same conception of [[en:pojmy:reprezentativnost|representativeness]] based on the reception of written language and the resulting composition of the corpus. The SYN2010 corpus is [[en:pojmy:lemma|lemmatized]] and [[en:pojmy:tag|morphologically tagged]].
  
  
Line 22: Line 22:
 ====== Changes compared to the SYN2005 corpus ====== ====== Changes compared to the SYN2005 corpus ======
  
-Compared to the corpus [[en:SYN2005|SYN2005]], the SYN2010 corpus saw **significant improvements in lemmatization** and **[[en:pojmy:tag|morphological tagging]]**; both basically identical to the processing of the [[en:SYN2009PUB|SYN2009PUB]] corpus. Therefore, although [[en:SYN2005|SYN2005]] and SYN2010 do not differ in their understanding of [[en:pojmy:reprezentativnost|representativeness]], **these differences should be taken into account** when comparing their lexical frequencies. +Compared to the corpus [[en:cnk:SYN2005|SYN2005]], the SYN2010 corpus saw **significant improvements in lemmatization** and **[[en:pojmy:tag|morphological tagging]]**; both basically identical to the processing of the [[en:cnk:SYN2009PUB|SYN2009PUB]] corpus. Therefore, although [[en:cnk:SYN2005|SYN2005]] and SYN2010 do not differ in their understanding of [[en:pojmy:reprezentativnost|representativeness]], **these differences should be taken into account** when comparing their lexical frequencies. 
  
 ====== Composition of SYN2010 ====== ====== Composition of SYN2010 ======