AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revisionBoth sides next revision
en:cnk:syn:verze5 [2017/04/25 13:20] – created michalskrabalen:cnk:syn:verze5 [2017/04/26 13:58] – [Journalism in SYN version 5] michalkren
Line 16: Line 16:
 </WRAP> </WRAP>
  
-Every **SYN corpus** contains all the [[en:pojmy:synchronni|synchronic]] [[en:pojmy:psany|written]] corpora of the [[en:cnk:syn|SYN]] series published up until the time of the given version's publication. The corpus SYN version therefore contains the corpora  [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2010|SYN2010]],[[en:cnk:syn2013pub|SYN2013PUB]] and [[cnk:syn2015|SYN2015]]; it additionally contains an **as yet unpublished journalistic component predominantly from the years 2010–2015** in yearly volumes exceeding 200 mil. words.+Every **SYN corpus** contains all the [[en:pojmy:synchronni|synchronic]] [[en:pojmy:psany|written]] corpora of the [[en:cnk:syn|SYN]] series published up until the time of the given version's publication. The corpus SYN version therefore contains the corpora  [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2010|SYN2010]],[[en:cnk:syn2013pub|SYN2013PUB]] and [[cnk:syn2015|SYN2015]]; additionally, it contains journalistic component predominantly from the years 2010–2014 (already included into [[en:cnk:syn:verze4|SYN version 4]]) and as yet **unpublished journalistic texts from 2015** in yearly volume exceeding 200 mil. words.
  
 Because all of these corpora are **disjunctive** (i.e. they do not contain the same texts), the total size of the SYN version 5 is given by their sum, which makes 3,626 FIXME billion words ([[en:pojmy:token|tokens]] without punctuation). The SYN corpus is not [[en:pojmy:reprezentativnost|representative]]; the dominant component is journalism, which is the result of the predominance of journalistic corpora [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2013pub|SYN2013PUB]] and the journalistic component from the years 2010–2015. Because all of these corpora are **disjunctive** (i.e. they do not contain the same texts), the total size of the SYN version 5 is given by their sum, which makes 3,626 FIXME billion words ([[en:pojmy:token|tokens]] without punctuation). The SYN corpus is not [[en:pojmy:reprezentativnost|representative]]; the dominant component is journalism, which is the result of the predominance of journalistic corpora [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2013pub|SYN2013PUB]] and the journalistic component from the years 2010–2015.
Line 38: Line 38:
  
  
-The composition of the journalistic part of the corpus SYN version 5 covers the production of most of the national daily newspapers  (//Mladá fronta DNES, Lidové noviny, Právo, Hospodářské noviny, Blesk, Sport//), regional daily newspapers (chiefly //Deníky Bohemia// and //Moravia// published by Vltava Labe Media) and non-specialized magazines (//Reflex, Respekt, Týden//) from the years 1998--2014; the total number of journalistic titles is 176 FIXME. The following graphs show the composition of the SYN corpus based on the [[en:pojmy:txtype_group|main text types]] over the years and offer a closer look at the composition of the journalistic section. +The composition of the journalistic part of the corpus SYN version 5 covers the production of most of the national daily newspapers  (//Mladá fronta DNES, Lidové noviny, Právo, Hospodářské noviny, Blesk, Sport//), regional daily newspapers (chiefly //Deníky Bohemia// and //Moravia// published by Vltava Labe Media) and non-specialized magazines (//Reflex, Respekt, Týden//) from the years 1998--2014; the total number of journalistic titles is 176. The following graphs show the composition of the SYN corpus based on the [[en:pojmy:txtype_group|main text types]] over the years and offer a closer look at the composition of the journalistic section. 
  
-[{{:cnk:slozeni_syn_v4.png?400|Composition of the corpus SYN version 5}}]+[{{:cnk:slozeni_syn_v5.png?400|Composition of the corpus SYN version 5}}]
  
-[{{:cnk:slozeni_syn_v4_pub.png?400|Composition of the journalistic part of the corpus SYN version 5}}] +[{{:cnk:slozeni_syn_v5_pub.png?400|Composition of the journalistic part of the corpus SYN version 5}}]
- +
-FIXME+
  
 ====== Structure and annotation of the corpus SYN version 5 ====== ====== Structure and annotation of the corpus SYN version 5 ======
Line 57: Line 55:
  
 <WRAP round tip 70%> <WRAP round tip 70%>
-Křen, M. – Cvrček, V. – Čapka, T. – Čermáková, A. – Hnátková, M. – Chlumská, L. – Jelínek, T. – Kováříková, D. – Petkevič, V. – Procházka, P. – Skoumalová, H. – Škrabal, M. – Truneček, P. – Vondřička, P. – Zasina, A.: Corpus SYN, verze 24. 4. 2017. Ústav Českého národního korpusu FF UK, Praha 2016. Available online: http://www.korpus.cz.+Křen, M. – Cvrček, V. – Čapka, T. – Čermáková, A. – Hnátková, M. – Chlumská, L. – Jelínek, T. – Kováříková, D. – Petkevič, V. – Procházka, P. – Skoumalová, H. – Škrabal, M. – Truneček, P. – Vondřička, P. – Zasina, A.: Corpus SYN, version from 24. 4. 2017. Ústav Českého národního korpusu FF UK, Praha 2017. Available online: http://www.korpus.cz.
  
  
 Hnátková, M. – Křen, M. – Procházka, P. – Skoumalová, H. (2014): [[http://www.lrec-conf.org/proceedings/lrec2014/pdf/294_Paper.pdf|The SYN-series corpora of written Czech]]. In //Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)//, 160–164. Reykjavík: ELRA. ISBN 978-2-9517408-8-4.  Hnátková, M. – Křen, M. – Procházka, P. – Skoumalová, H. (2014): [[http://www.lrec-conf.org/proceedings/lrec2014/pdf/294_Paper.pdf|The SYN-series corpora of written Czech]]. In //Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)//, 160–164. Reykjavík: ELRA. ISBN 978-2-9517408-8-4. 
- 
-FIXME nechat, též podle cz verze? 
 </WRAP> </WRAP>
- 
  
  
Line 71: Line 66:
 ====== Related links ====== ====== Related links ======
 <WRAP round box 50%> <WRAP round box 50%>
-[[en:cnk:syn|SYN]] • [[en:cnk:syn:verze3|SYN version 3]] • [[en:cnk:syn2000|SYN2000]] • [[en:cnk:syn2005|SYN2005]] • [[en:cnk:syn2006pub|SYN2006PUB]] • [[en:cnk:syn2009pub|SYN2009PUB]] • [[en:cnk:syn2010|SYN2010]] • [[en:cnk:SYN2013PUB|SYN2013PUB]] • [[en:cnk:syn2015|SYN2015]]+[[en:cnk:syn|SYN]] • [[en:cnk:syn:verze4|SYN version 4]] • [[en:cnk:syn:verze3|SYN version 3]] • [[en:cnk:syn2000|SYN2000]] • [[en:cnk:syn2005|SYN2005]] • [[en:cnk:syn2006pub|SYN2006PUB]] • [[en:cnk:syn2009pub|SYN2009PUB]] • [[en:cnk:syn2010|SYN2010]] • [[en:cnk:SYN2013PUB|SYN2013PUB]] • [[en:cnk:syn2015|SYN2015]]
  
 </WRAP> </WRAP>