Next revision | Previous revision |
en:cnk:syn:verze13 [2024/12/20 12:03] – created michalkren | en:cnk:syn:verze13 [2024/12/27 17:44] (current) – [SYN version 13] michalkren |
---|
<WRAP right 35%> | <WRAP right 35%> |
^ <fs medium>Name</fs> ^^ <fs medium>SYN version 13</fs> ^ | ^ <fs medium>Name</fs> ^^ <fs medium>SYN version 13</fs> ^ |
^ [[pojmy:atributy_pozicni|Position]] ^ Number of tokens | 6 238 142 297 | | ^ [[pojmy:atributy_pozicni|Position]] ^ Number of tokens | 6 400 899 055 | |
^ ::: ^ Number of tokens without punctuation | 5 174 701 189 | | ^ ::: ^ Number of tokens without punctuation | 5 310 635 949 | |
^ ::: ^ Number of [[en:pojmy:word|word forms]] | 11 384 712 | | ^ ::: ^ Number of [[en:pojmy:word|word forms]] | 11 522 926 | |
^ ::: ^ Number of [[en:pojmy:lemma|lemmas]] | 7 604 956 | | ^ ::: ^ Number of [[en:pojmy:lemma|lemmas]] | 7 655 932 | |
^ Structures ^ Number of documents | 144 755 | | ^ Structures ^ Number of documents | 151 076 | |
^ ::: ^ Number of texts | 18 965 216 | | ^ ::: ^ Number of texts | 19 363 730 | |
^ ::: ^ Number of sentences | 398 423 123 | | ^ ::: ^ Number of sentences | 408 749 819 | |
^ Other information ^ Referential | YES | | ^ Other information ^ Referential | YES | |
^ ::: ^ Representative | NO (predominantly journalism) | | ^ ::: ^ Representative | NO (predominantly journalism) | |
^ ::: ^ Publication year | 2023 | | ^ ::: ^ Publication year | 2024 | |
</WRAP> | </WRAP> |
| |
Every **SYN corpus** contains all the [[en:pojmy:synchronni|synchronic]] [[en:pojmy:psany|written]] corpora of the [[en:cnk:syn|SYN]] series published up until the time of the given version's publication. The corpus SYN version 13 therefore contains the [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2010|SYN2010]],[[en:cnk:syn2013pub|SYN2013PUB]], [[en:cnk:syn2015|SYN2015]] and [[en:cnk:syn2020|SYN2020]] corpora; additionally, it contains a journalistic component predominantly from 2010–2022 (already included into [[en:cnk:syn:verze4|SYN version 4]] -- [[en:cnk:syn:verze12|SYN version 12]]) corpora, and as yet **unpublished journalistic texts from 2023** in yearly volume almost 150 mil. words. | Every **SYN corpus** contains all the [[en:pojmy:synchronni|synchronic]] [[en:pojmy:psany|written]] corpora of the [[en:cnk:syn|SYN]] series published up until the time of the given version's publication. The corpus SYN version 13 therefore contains the [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2010|SYN2010]],[[en:cnk:syn2013pub|SYN2013PUB]], [[en:cnk:syn2015|SYN2015]] and [[en:cnk:syn2020|SYN2020]] corpora; additionally, it contains a journalistic component predominantly from 2010–2022 (already included into [[en:cnk:syn:verze4|SYN version 4]] -- [[en:cnk:syn:verze12|SYN version 12]]) corpora, and as yet **unpublished journalistic texts from 2023** in yearly volume of more than 100 mil. words. |
| |
The SYN corpus is not [[en:pojmy:reprezentativnost|representative]]; the dominant component is journalism, which is the result of the predominance of journalistic corpora [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2013pub|SYN2013PUB]] and the journalistic component from 2010--2023. | The SYN corpus is not [[en:pojmy:reprezentativnost|representative]]; the dominant component is journalism, which is the result of the predominance of journalistic corpora [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2013pub|SYN2013PUB]] and the journalistic component from 2010--2023. |
| |
<WRAP round tip 70%> | <WRAP round tip 70%> |
Křen, M. – Cvrček, V. – Čapka, T. – Hnátková, M. – Jelínek, T. – Kocek, J. – Kováříková, D. – Křivan, J. – Milička, J. – Petkevič, V. – Skoumalová, H. – Šindlerová, J. – Škrabal, M.: //Corpus SYN, version 13 from 29. 12. 2024//. Ústav Českého národního korpusu FF UK, Praha 2024. Available online: https://www.korpus.cz. | Křen, M. – Cvrček, V. – Čapka, T. – Hnátková, M. – Jelínek, T. – Kocek, J. – Kováříková, D. – Křivan, J. – Milička, J. – Petkevič, V. – Skoumalová, H. – Šindlerová, J. – Škrabal, M.: //Corpus SYN, version 13 from 27. 12. 2024//. Ústav Českého národního korpusu FF UK, Praha 2024. Available online: https://www.korpus.cz. |
| |
| |