Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision |
en:cnk:uvod [2018/10/02 10:57] – [Corpora of the Czech National Corpus project] - ICv.11 alexandrrosen | en:cnk:uvod [2018/12/20 12:58] – [Corpora of the Czech National Corpus project] michalskrabal |
---|
^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ released((For versioned corpora (e.g. [[en:cnk:syn|SYN]] or [[en:cnk:intercorp|InterCorp]]), the year when the first version was released is stated.)) ^ characteristic features ^ | ^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ released((For versioned corpora (e.g. [[en:cnk:syn|SYN]] or [[en:cnk:intercorp|InterCorp]]), the year when the first version was released is stated.)) ^ characteristic features ^ |
| **General corpora** |||||| | | **General corpora** |||||| |
| [[en:cnk:syn|SYN]] ([[en:cnk:syn:verze6|version 6]]) | 4.033G | ✓ | ✓ | 2010 | versioned corpus, unification of all the SYN-series synchronic written corpora | | | [[en:cnk:syn|SYN]] ([[en:cnk:syn:verze7|version 7]]) | 4.255G | ✓ | ✓ | 2010 | versioned corpus, unification of all the SYN-series synchronic written corpora | |
| [[en:cnk:syn2015|SYN2015]] | 100M | ✓ | ✓ | 2015 | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts | | | [[en:cnk:syn2015|SYN2015]] | 100M | ✓ | ✓ | 2015 | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts | |
| [[en:cnk:syn2013PUB|SYN2013PUB]] | 935M | ✓ | ✓ | 2013 | reference corpus of newspapers and magazines from 2005--2009 | | | [[en:cnk:syn2013PUB|SYN2013PUB]] | 935M | ✓ | ✓ | 2013 | reference corpus of newspapers and magazines from 2005--2009 | |
| [[en:cnk:fsc2000|FSC2000]] | 100M | ✓ | ✗ | 2004 | modified [[en:cnk:syn2000|SYN2000]], source of the Frequency Dictionary of Czech | | | [[en:cnk:fsc2000|FSC2000]] | 100M | ✓ | ✗ | 2004 | modified [[en:cnk:syn2000|SYN2000]], source of the Frequency Dictionary of Czech | |
| [[en:cnk:jerome|JEROME]] | 85M | ✓ | ✓ | 2013 | monolingual comparable corpus for translation studies | | | [[en:cnk:jerome|JEROME]] | 85M | ✓ | ✓ | 2013 | monolingual comparable corpus for translation studies | |
| | [[en:cnk:koditex|Koditex]] | 10.8 mil. | ✓ | ✓ | 2018 | corpus for multi-dimensional analysis of Czech registers | |
| [[en:cnk:ksk-dopisy|KSK-DOPISY]] | 800k | ✗ | ✗ | 2006 | transcriptions of handwritten correspondence from 1990--2004| | | [[en:cnk:ksk-dopisy|KSK-DOPISY]] | 800k | ✗ | ✗ | 2006 | transcriptions of handwritten correspondence from 1990--2004| |
| [[en:cnk:link|LINK]] | 1.8M | ✓ | ✓ | 2010 | non-reference corpus of linguistic texts | | | [[en:cnk:link|LINK]] | 1.8M | ✓ | ✓ | 2010 | non-reference corpus of linguistic texts | |
| [[en:cnk:hotko|HOTKO]] | 36M | ✗ | ✗ | 2013 | non-reference corpus of Upper Sorbian | | | [[en:cnk:hotko|HOTKO]] | 36M | ✗ | ✗ | 2013 | non-reference corpus of Upper Sorbian | |
| [[en:cnk:lEstRepublicain|lEstRepublicain]] | 73M | ✓ | ✓ | 2013 | corpus of French newspaper L'Est Républicain | | | [[en:cnk:lEstRepublicain|lEstRepublicain]] | 73M | ✓ | ✓ | 2013 | corpus of French newspaper L'Est Républicain | |
| | [[en:cnk:nkjp|NKJP_1M]] | 1M | ✓ | ✓ | 2018 | manually annotated one-million subcorpus of the National Corpus of Polish | |