AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
en:cnk:uvod [2017/12/12 11:11] michalskrabalen:cnk:uvod [2018/11/12 15:33] – [Corpora of the Czech National Corpus project] vaclavcvrcek
Line 21: Line 21:
 | [[en:cnk:fsc2000|FSC2000]] |  100M |  ✓  |  ✗  |  2004  | modified [[en:cnk:syn2000|SYN2000]], source of the Frequency Dictionary of Czech | | [[en:cnk:fsc2000|FSC2000]] |  100M |  ✓  |  ✗  |  2004  | modified [[en:cnk:syn2000|SYN2000]], source of the Frequency Dictionary of Czech |
 | [[en:cnk:jerome|JEROME]] |  85M |  ✓  |  ✓  |  2013  | monolingual comparable corpus for translation studies | | [[en:cnk:jerome|JEROME]] |  85M |  ✓  |  ✓  |  2013  | monolingual comparable corpus for translation studies |
 +| [[en:cnk:koditex|Koditex]] |  10.8 mil. |  ✓  |  ✓  |  2018  | corpus for multi-dimensional analysis of Czech registers |
 | [[en:cnk:ksk-dopisy|KSK-DOPISY]] |  800k |  ✗  |  ✗  |  2006  | transcriptions of handwritten correspondence from 1990--2004| | [[en:cnk:ksk-dopisy|KSK-DOPISY]] |  800k |  ✗  |  ✗  |  2006  | transcriptions of handwritten correspondence from 1990--2004|
 | [[en:cnk:link|LINK]] |  1.8M |  ✓  |  ✓  |  2010  | non-reference corpus of linguistic texts | | [[en:cnk:link|LINK]] |  1.8M |  ✓  |  ✓  |  2010  | non-reference corpus of linguistic texts |
Line 46: Line 47:
 ^ corpus ^ size (word count) ^  lemmas  ^ morphological tags ^  year  ^ characteristic features ^ ^ corpus ^ size (word count) ^  lemmas  ^ morphological tags ^  year  ^ characteristic features ^
 | **Parallel corpora** |||||| | **Parallel corpora** ||||||
-| [[en:cnk:intercorp|InterCorp]] ([[en:cnk:intercorp:verze10|version 10]]) |  1.48G |  (✓)  |  (✓)  |  2008  | versioned parallel corpus being compiled as a part of the [[http://ucnk.ff.cuni.cz/intercorp/?lang=en|InterCorp project]] |+| [[en:cnk:intercorp|InterCorp]] ([[en:cnk:intercorp:verze11|version 11]]) |  1.7G |  (✓)  |  (✓)  |  2008  | versioned parallel corpus being compiled as a part of the [[http://ucnk.ff.cuni.cz/intercorp/?lang=en|InterCorp project]] |
 | **Comparable corpora** |||||| | **Comparable corpora** ||||||
 | [[en:cnk:aranea|Aranea]] |  1G |  ✓  |  ✓  |  2014  | comparable web corpora for several European languages (cs, de, en, es, fi, fr, hu, it, nl, pl, pt, ru, sk, zh) | | [[en:cnk:aranea|Aranea]] |  1G |  ✓  |  ✓  |  2014  | comparable web corpora for several European languages (cs, de, en, es, fi, fr, hu, it, nl, pl, pt, ru, sk, zh) |
Line 58: Line 59:
 | [[en:cnk:hotko|HOTKO]] |  36M |  ✗  |  ✗  |  2013  | non-reference corpus of Upper Sorbian | | [[en:cnk:hotko|HOTKO]] |  36M |  ✗  |  ✗  |  2013  | non-reference corpus of Upper Sorbian |
 | [[en:cnk:lEstRepublicain|lEstRepublicain]] |  73M |  ✓  |  ✓  |  2013  | corpus of French newspaper L'Est Républicain | | [[en:cnk:lEstRepublicain|lEstRepublicain]] |  73M |  ✓  |  ✓  |  2013  | corpus of French newspaper L'Est Républicain |
 +| [[en:cnk:nkjp|NKJP_1M]] |  1M |  ✓  |  ✓  |  2018  | manually annotated one-million subcorpus of the National Corpus of Polish |