Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:cnk:syn:verze7 [2018/12/20 13:00] – michalskrabal | en:cnk:syn:verze7 [2018/12/20 13:50] (current) – [How to cite SYN version 7] michalkren |
---|
^ ::: ^ Number of [[en:pojmy:word|word forms]] | 11 632 632 | | ^ ::: ^ Number of [[en:pojmy:word|word forms]] | 11 632 632 | |
^ ::: ^ Number of [[en:pojmy:lemma|lemmas]] | 8 360 795 | | ^ ::: ^ Number of [[en:pojmy:lemma|lemmas]] | 8 360 795 | |
^ [[en:pojmy:atributy_strukturni|Structures]] ^ Number of [[en:pojmy:doc|documents]] | 106 350 | | ^ [[en:pojmy:atributy_strukturni|Structures]] ^ Number of [[en:pojmy:doc|documents]] | 106 350 | |
^ ::: ^ Number of [[en:pojmy:atributy_strukturni|texts]]| 16 377 839 | | ^ ::: ^ Number of [[en:pojmy:atributy_strukturni|texts]]| 16 377 839 | |
^ ::: ^ Number of sentences | 325 540 933 | | ^ ::: ^ Number of sentences | 325 540 933 | |
</WRAP> | </WRAP> |
| |
Every **SYN corpus** contains all the [[en:pojmy:synchronni|synchronic]] [[en:pojmy:psany|written]] corpora of the [[en:cnk:syn|SYN]] series published up until the time of the given version's publication. The corpus SYN version 7 therefore contains the corpora [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2010|SYN2010]],[[en:cnk:syn2013pub|SYN2013PUB]] and [[cnk:syn2015|SYN2015]]; additionally, it contains a journalistic component predominantly from the years 2010–2014 (already included into [[en:cnk:syn:verze4|SYN version 4]], [[en:cnk:syn:verze5|SYN version 5]] and [[en:cnk:syn:verze6|SYN version 6]]) and as yet **unpublished journalistic texts from 2017** in yearly volume more than 265 mil. words. | Every **SYN corpus** contains all the [[en:pojmy:synchronni|synchronic]] [[en:pojmy:psany|written]] corpora of the [[en:cnk:syn|SYN]] series published up until the time of the given version's publication. The corpus SYN version 7 therefore contains the corpora [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2010|SYN2010]],[[en:cnk:syn2013pub|SYN2013PUB]] and [[cnk:syn2015|SYN2015]]; additionally, it contains a journalistic component predominantly from the years 2010–2014 (already included into [[en:cnk:syn:verze4|SYN version 4]], [[en:cnk:syn:verze5|SYN version 5]] and [[en:cnk:syn:verze6|SYN version 6]]) and as yet **unpublished journalistic texts from 2017** in yearly volume almost 200 mil. words. |
| |
Because all of these corpora are **disjunctive** (i.e. they do not contain the same texts), the total size of the SYN version 7 is given by their sum, which makes 4.255 billion words ([[en:pojmy:token|tokens]] without punctuation). The SYN corpus is not [[en:pojmy:reprezentativnost|representative]]; the dominant component is journalism, which is the result of the predominance of journalistic corpora [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2013pub|SYN2013PUB]] and the journalistic component from the years 2010--2017. | Because all of these corpora are **disjunctive** (i.e. they do not contain the same texts), the total size of the SYN version 7 is given by their sum, which makes 4.255 billion words ([[en:pojmy:token|tokens]] without punctuation). The SYN corpus is not [[en:pojmy:reprezentativnost|representative]]; the dominant component is journalism, which is the result of the predominance of journalistic corpora [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2013pub|SYN2013PUB]] and the journalistic component from the years 2010--2017. |
| |
<WRAP round tip 70%> | <WRAP round tip 70%> |
Křen, M. – Cvrček, V. – Čapka, T. – Čermáková, A. – Hnátková, M. – Chlumská, L. – Jelínek, T. – Kováříková, D. – Petkevič, V. – Procházka, P. – Skoumalová, H. – Škrabal, M. – Truneček, P. – Vondřička, P. – Zasina, A.: Corpus SYN, version 7 from 29. 11. 2018. Ústav Českého národního korpusu FF UK, Praha 2018. Available online: http://www.korpus.cz. | Křen, M. – Cvrček, V. – Čapka, T. – Čermáková, A. – Hnátková, M. – Chlumská, L. – Jelínek, T. – Kováříková, D. – Petkevič, V. – Procházka, P. – Skoumalová, H. – Škrabal, M. – Truneček, P. – Vondřička, P. – Zasina, A.: //Corpus SYN, version 7 from 29. 11. 2018//. Ústav Českého národního korpusu FF UK, Praha 2018. Available online: http://www.korpus.cz. |
| |
| |