Skrýt
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:cnk:uvod [2018/11/07 15:30]
Michal Křen [Corpora of the Czech National Corpus project]
en:cnk:uvod [2018/12/20 12:58] (current)
Michal Škrabal [Corpora of the Czech National Corpus project]
Line 7: Line 7:
 ^ corpus ^ size (word count) ^  lemmas ​ ^ morphological tags ^  released((For versioned corpora (e.g. [[en:​cnk:​syn|SYN]] or [[en:​cnk:​intercorp|InterCorp]]),​ the year when the first version was released is stated.)) ​ ^ characteristic features ^ ^ corpus ^ size (word count) ^  lemmas ​ ^ morphological tags ^  released((For versioned corpora (e.g. [[en:​cnk:​syn|SYN]] or [[en:​cnk:​intercorp|InterCorp]]),​ the year when the first version was released is stated.)) ​ ^ characteristic features ^
 | **General corpora** |||||| | **General corpora** ||||||
-| [[en:​cnk:​syn|SYN]] ([[en:​cnk:​syn:​verze6|version ​6]]) |  4.033G |  ✓  |  ✓  |  2010  | versioned corpus, unification of all the SYN-series synchronic written corpora |+| [[en:​cnk:​syn|SYN]] ([[en:​cnk:​syn:​verze7|version ​7]]) |  4.255G |  ✓  |  ✓  |  2010  | versioned corpus, unification of all the SYN-series synchronic written corpora |
 | [[en:​cnk:​syn2015|SYN2015]] |  100M |  ✓  |  ✓  |  2015  | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts | | [[en:​cnk:​syn2015|SYN2015]] |  100M |  ✓  |  ✓  |  2015  | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts |
 | [[en:​cnk:​syn2013PUB|SYN2013PUB]] |  935M |  ✓  |  ✓  |  2013  | reference corpus of newspapers and magazines from 2005--2009 | | [[en:​cnk:​syn2013PUB|SYN2013PUB]] |  935M |  ✓  |  ✓  |  2013  | reference corpus of newspapers and magazines from 2005--2009 |
Line 21: Line 21:
 | [[en:​cnk:​fsc2000|FSC2000]] |  100M |  ✓  |  ✗  |  2004  | modified [[en:​cnk:​syn2000|SYN2000]],​ source of the Frequency Dictionary of Czech | | [[en:​cnk:​fsc2000|FSC2000]] |  100M |  ✓  |  ✗  |  2004  | modified [[en:​cnk:​syn2000|SYN2000]],​ source of the Frequency Dictionary of Czech |
 | [[en:​cnk:​jerome|JEROME]] |  85M |  ✓  |  ✓  |  2013  | monolingual comparable corpus for translation studies | | [[en:​cnk:​jerome|JEROME]] |  85M |  ✓  |  ✓  |  2013  | monolingual comparable corpus for translation studies |
 +| [[en:​cnk:​koditex|Koditex]] |  10.8 mil. |  ✓  |  ✓  |  2018  | corpus for multi-dimensional analysis of Czech registers |
 | [[en:​cnk:​ksk-dopisy|KSK-DOPISY]] |  800k |  ✗  |  ✗  |  2006  | transcriptions of handwritten correspondence from 1990--2004| | [[en:​cnk:​ksk-dopisy|KSK-DOPISY]] |  800k |  ✗  |  ✗  |  2006  | transcriptions of handwritten correspondence from 1990--2004|
 | [[en:​cnk:​link|LINK]] |  1.8M |  ✓  |  ✓  |  2010  | non-reference corpus of linguistic texts | | [[en:​cnk:​link|LINK]] |  1.8M |  ✓  |  ✓  |  2010  | non-reference corpus of linguistic texts |