AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:syn:verze7 [2018/12/20 13:00] Michal Škrabalen:cnk:syn:verze7 [2018/12/20 13:50] (current) – [How to cite SYN version 7] Michal Křen
Line 8: Line 8:
 ^ ::: ^ Number of [[en:pojmy:word|word forms]]  |  11 632 632 |   ^ ::: ^ Number of [[en:pojmy:word|word forms]]  |  11 632 632 |  
 ^ ::: ^ Number of [[en:pojmy:lemma|lemmas]] |  8 360 795 | ^ ::: ^ Number of [[en:pojmy:lemma|lemmas]] |  8 360 795 |
-^ [[en:pojmy:atributy_strukturni|Structures]] ^ Number of [[en:pojmy:doc|documents]] |  106 350  |+^ [[en:pojmy:atributy_strukturni|Structures]] ^ Number of [[en:pojmy:doc|documents]] |  106 350 |
 ^ ::: ^ Number of [[en:pojmy:atributy_strukturni|texts]]|  16 377 839 | ^ ::: ^ Number of [[en:pojmy:atributy_strukturni|texts]]|  16 377 839 |
 ^ ::: ^ Number of sentences |  325 540 933 | ^ ::: ^ Number of sentences |  325 540 933 |
Line 16: Line 16:
 </WRAP> </WRAP>
  
-Every **SYN corpus** contains all the [[en:pojmy:synchronni|synchronic]] [[en:pojmy:psany|written]] corpora of the [[en:cnk:syn|SYN]] series published up until the time of the given version's publication. The corpus SYN version 7 therefore contains the corpora  [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2010|SYN2010]],[[en:cnk:syn2013pub|SYN2013PUB]] and [[cnk:syn2015|SYN2015]]; additionally, it contains a journalistic component predominantly from the years 2010–2014 (already included into [[en:cnk:syn:verze4|SYN version 4]], [[en:cnk:syn:verze5|SYN version 5]] and [[en:cnk:syn:verze6|SYN version 6]]) and as yet **unpublished journalistic texts from 2017** in yearly volume more than 265 mil. words.+Every **SYN corpus** contains all the [[en:pojmy:synchronni|synchronic]] [[en:pojmy:psany|written]] corpora of the [[en:cnk:syn|SYN]] series published up until the time of the given version's publication. The corpus SYN version 7 therefore contains the corpora  [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2010|SYN2010]],[[en:cnk:syn2013pub|SYN2013PUB]] and [[cnk:syn2015|SYN2015]]; additionally, it contains a journalistic component predominantly from the years 2010–2014 (already included into [[en:cnk:syn:verze4|SYN version 4]], [[en:cnk:syn:verze5|SYN version 5]] and [[en:cnk:syn:verze6|SYN version 6]]) and as yet **unpublished journalistic texts from 2017** in yearly volume almost 200 mil. words.
  
 Because all of these corpora are **disjunctive** (i.e. they do not contain the same texts), the total size of the SYN version 7 is given by their sum, which makes 4.255 billion words ([[en:pojmy:token|tokens]] without punctuation). The SYN corpus is not [[en:pojmy:reprezentativnost|representative]]; the dominant component is journalism, which is the result of the predominance of journalistic corpora [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2013pub|SYN2013PUB]] and the journalistic component from the years 2010--2017. Because all of these corpora are **disjunctive** (i.e. they do not contain the same texts), the total size of the SYN version 7 is given by their sum, which makes 4.255 billion words ([[en:pojmy:token|tokens]] without punctuation). The SYN corpus is not [[en:pojmy:reprezentativnost|representative]]; the dominant component is journalism, which is the result of the predominance of journalistic corpora [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2013pub|SYN2013PUB]] and the journalistic component from the years 2010--2017.
Line 52: Line 52:
  
 <WRAP round tip 70%> <WRAP round tip 70%>
-Křen, M. – Cvrček, V. – Čapka, T. – Čermáková, A. – Hnátková, M. – Chlumská, L. – Jelínek, T. – Kováříková, D. – Petkevič, V. – Procházka, P. – Skoumalová, H. – Škrabal, M. – Truneček, P. – Vondřička, P. – Zasina, A.: Corpus SYN, version 7 from 29. 11. 2018. Ústav Českého národního korpusu FF UK, Praha 2018. Available online: http://www.korpus.cz.+Křen, M. – Cvrček, V. – Čapka, T. – Čermáková, A. – Hnátková, M. – Chlumská, L. – Jelínek, T. – Kováříková, D. – Petkevič, V. – Procházka, P. – Skoumalová, H. – Škrabal, M. – Truneček, P. – Vondřička, P. – Zasina, A.: //Corpus SYN, version 7 from 29. 11. 2018//. Ústav Českého národního korpusu FF UK, Praha 2018. Available online: http://www.korpus.cz.