Skrýt
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
en:cnk:syn:verze3 [2016/12/11 16:48]
Veronika Pojarová [How to cite SYN version 3]
en:cnk:syn:verze3 [2017/04/21 11:01] (current)
Michal Škrabal [The composition of the SYN version 3 corpus]
Line 26: Line 26:
 ^ <fs medium>​Referential written language corpora (synchronic and general) ordered by date of creation</​fs>​ ^^^^^^ ^ <fs medium>​Referential written language corpora (synchronic and general) ordered by date of creation</​fs>​ ^^^^^^
 ^ corpus ^ size (words) ^ [[en:​pojmy:​lemma|lemmatization]] ^ [[en:​pojmy:​tag|morphological tags]] ^ publication year ^ corpus description ^ ^ corpus ^ size (words) ^ [[en:​pojmy:​lemma|lemmatization]] ^ [[en:​pojmy:​tag|morphological tags]] ^ publication year ^ corpus description ^
-^ [[en:​cnk:​syn2013PUB|SYN2013PUB]] | 935 mil. |  ​YES  ​|  ​YES  ​| ​ 2013  | corpus of journalistic texts from the years 2005-2009 | +^ [[en:​cnk:​syn2013PUB|SYN2013PUB]] | 935 mil. |  ​✓  ​|  ​✓  ​| ​ 2013  | corpus of journalistic texts from the years 2005-2009 | 
-^ [[en:​cnk:​syn2010|SYN2010]] | 100 mil. |  ​YES  ​|  ​YES  ​| ​ 2010  | representative corpus, mainly texts from the years  2005–2009| +^ [[en:​cnk:​syn2010|SYN2010]] | 100 mil. |  ​✓  ​|  ​✓  ​| ​ 2010  | representative corpus, mainly texts from the years  2005–2009| 
-^ [[en:​cnk:​syn2009PUB|SYN2009PUB]] | 700 mil. |  ​YES  ​|  ​YES  ​| ​ 2010  | corpus of journalistic texts from the years 1995–2007 | +^ [[en:​cnk:​syn2009PUB|SYN2009PUB]] | 700 mil. |  ​✓  ​|  ​✓  ​| ​ 2010  | corpus of journalistic texts from the years 1995–2007 | 
-^ [[en:​cnk:​syn2006PUB|SYN2006PUB]] | 300 mil. |  ​YES  ​|  ​YES  ​| ​ 2006  | corpus of journalistic texts from the years 1989–2004| +^ [[en:​cnk:​syn2006PUB|SYN2006PUB]] | 300 mil. |  ​✓  ​|  ​✓  ​| ​ 2006  | corpus of journalistic texts from the years 1989–2004| 
-^ [[en:​cnk:​syn2005|SYN2005]] | 100 mil. |  ​YES  ​|  ​YES  ​| ​ 2005  | representative corpus, mainly texts from the years  2000–2004| +^ [[en:​cnk:​syn2005|SYN2005]] | 100 mil. |  ​✓  ​|  ​✓  ​| ​ 2005  | representative corpus, mainly texts from the years  2000–2004| 
-^ [[en:​cnk:​syn2000|SYN2000]] | 100 mil. |  ​YES  ​|  ​YES  ​| ​ 2000  | representative corpus, mainly texts from the years 1990–1999|+^ [[en:​cnk:​syn2000|SYN2000]] | 100 mil. |  ​✓  ​|  ​✓  ​| ​ 2000  | representative corpus, mainly texts from the years 1990–1999|
  
 The composition of the journalistic part of the corpus SYN version 3 covers the production of most of the national daily newspapers (Mladá fronta DNES, Lidové noviny, Právo, Hospodářské noviny, Blesk) and non-specialized magazines (Reflex, Respekt, Týden) between the years 1998--2009. A table containing the 15 titles most represented in the journalistic part of the corpus SYN version 3 (with a layout for the individual years; the numbers are in millions of words, i.e. positions not counting punctuation) can be downloaded below, a preview of the composition of the journalism part can be seen on the following graph. ​ The composition of the journalistic part of the corpus SYN version 3 covers the production of most of the national daily newspapers (Mladá fronta DNES, Lidové noviny, Právo, Hospodářské noviny, Blesk) and non-specialized magazines (Reflex, Respekt, Týden) between the years 1998--2009. A table containing the 15 titles most represented in the journalistic part of the corpus SYN version 3 (with a layout for the individual years; the numbers are in millions of words, i.e. positions not counting punctuation) can be downloaded below, a preview of the composition of the journalism part can be seen on the following graph. ​