AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:syn:verze13 [2024/12/23 13:53] – [Journalism in SYN version 13] michalkrenen:cnk:syn:verze13 [2024/12/27 17:44] (current) – [SYN version 13] michalkren
Line 4: Line 4:
 <WRAP right 35%> <WRAP right 35%>
 ^ <fs medium>Name</fs> ^^ <fs medium>SYN version 13</fs> ^ ^ <fs medium>Name</fs> ^^ <fs medium>SYN version 13</fs> ^
-^ [[pojmy:atributy_pozicni|Position]] ^ Number of tokens |  6 238 142 297 |   +^ [[pojmy:atributy_pozicni|Position]] ^ Number of tokens |  6 400 899 055 |   
-^ ::: ^ Number of tokens without punctuation  |  5 174 701 189 |   +^ ::: ^ Number of tokens without punctuation  |  5 310 635 949 |   
-^ ::: ^ Number of [[en:pojmy:word|word forms]]  |  11 384 712 |   +^ ::: ^ Number of [[en:pojmy:word|word forms]]  |  11 522 926 |   
-^ ::: ^ Number of [[en:pojmy:lemma|lemmas]] |  7 604 956 +^ ::: ^ Number of [[en:pojmy:lemma|lemmas]] |  7 655 932 
-^ Structures ^ Number of documents |  144 755 +^ Structures ^ Number of documents |  151 076 
-^ ::: ^ Number of texts |  18 965 216 +^ ::: ^ Number of texts |  19 363 730 
-^ ::: ^ Number of sentences |  398 423 123 |+^ ::: ^ Number of sentences |  408 749 819 |
 ^ Other information ^ Referential |  YES |   ^ Other information ^ Referential |  YES |  
 ^ ::: ^ Representative |  NO (predominantly journalism) |   ^ ::: ^ Representative |  NO (predominantly journalism) |  
-^ ::: ^ Publication year |  2023 |+^ ::: ^ Publication year |  2024 |
 </WRAP> </WRAP>
  
-Every **SYN corpus** contains all the [[en:pojmy:synchronni|synchronic]] [[en:pojmy:psany|written]] corpora of the [[en:cnk:syn|SYN]] series published up until the time of the given version's publication. The corpus SYN version 13 therefore contains the [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2010|SYN2010]],[[en:cnk:syn2013pub|SYN2013PUB]], [[en:cnk:syn2015|SYN2015]] and [[en:cnk:syn2020|SYN2020]] corpora; additionally, it contains a journalistic component predominantly from 2010–2022 (already included into [[en:cnk:syn:verze4|SYN version 4]] -- [[en:cnk:syn:verze12|SYN version 12]]) corpora, and as yet **unpublished journalistic texts from 2023** in yearly volume almost 150 mil. words.+Every **SYN corpus** contains all the [[en:pojmy:synchronni|synchronic]] [[en:pojmy:psany|written]] corpora of the [[en:cnk:syn|SYN]] series published up until the time of the given version's publication. The corpus SYN version 13 therefore contains the [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2010|SYN2010]],[[en:cnk:syn2013pub|SYN2013PUB]], [[en:cnk:syn2015|SYN2015]] and [[en:cnk:syn2020|SYN2020]] corpora; additionally, it contains a journalistic component predominantly from 2010–2022 (already included into [[en:cnk:syn:verze4|SYN version 4]] -- [[en:cnk:syn:verze12|SYN version 12]]) corpora, and as yet **unpublished journalistic texts from 2023** in yearly volume of more than 100 mil. words.
  
 The SYN corpus is not [[en:pojmy:reprezentativnost|representative]]; the dominant component is journalism, which is the result of the predominance of journalistic corpora [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2013pub|SYN2013PUB]] and the journalistic component from 2010--2023. The SYN corpus is not [[en:pojmy:reprezentativnost|representative]]; the dominant component is journalism, which is the result of the predominance of journalistic corpora [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2013pub|SYN2013PUB]] and the journalistic component from 2010--2023.
Line 54: Line 54:
  
 <WRAP round tip 70%> <WRAP round tip 70%>
-Křen, M. – Cvrček, V. – Čapka, T. – Hnátková, M. – Jelínek, T. – Kocek, J. – Kováříková, D. – Křivan, J. – Milička, J. – Petkevič, V. – Skoumalová, H. – Šindlerová, J. – Škrabal, M.: //Corpus SYN, version 13 from 29. 12. 2024//. Ústav Českého národního korpusu FF UK, Praha 2024. Available online: https://www.korpus.cz.+Křen, M. – Cvrček, V. – Čapka, T. – Hnátková, M. – Jelínek, T. – Kocek, J. – Kováříková, D. – Křivan, J. – Milička, J. – Petkevič, V. – Skoumalová, H. – Šindlerová, J. – Škrabal, M.: //Corpus SYN, version 13 from 27. 12. 2024//. Ústav Českého národního korpusu FF UK, Praha 2024. Available online: https://www.korpus.cz.