Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision |
en:cnk:uvod [2017/12/12 10:42] – [Corpora of the Czech National Corpus project] michalkren | en:cnk:uvod [2018/10/02 10:57] – [Corpora of the Czech National Corpus project] - ICv.11 alexandrrosen |
---|
^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ released((For versioned corpora (e.g. [[en:cnk:syn|SYN]] or [[en:cnk:intercorp|InterCorp]]), the year when the first version was released is stated.)) ^ characteristic features ^ | ^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ released((For versioned corpora (e.g. [[en:cnk:syn|SYN]] or [[en:cnk:intercorp|InterCorp]]), the year when the first version was released is stated.)) ^ characteristic features ^ |
| **General corpora** |||||| | | **General corpora** |||||| |
| [[en:cnk:syn|SYN]] ([[en:cnk:syn:verze5|version 5]]) | 3.836G | ✓ | ✓ | 2010 | versioned corpus, unification of all the SYN-series synchronic written corpora | | | [[en:cnk:syn|SYN]] ([[en:cnk:syn:verze6|version 6]]) | 4.033G | ✓ | ✓ | 2010 | versioned corpus, unification of all the SYN-series synchronic written corpora | |
| [[en:cnk:syn2015|SYN2015]] | 100M | ✓ | ✓ | 2015 | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts | | | [[en:cnk:syn2015|SYN2015]] | 100M | ✓ | ✓ | 2015 | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts | |
| [[en:cnk:syn2013PUB|SYN2013PUB]] | 935M | ✓ | ✓ | 2013 | reference corpus of newspapers and magazines from 2005--2009 | | | [[en:cnk:syn2013PUB|SYN2013PUB]] | 935M | ✓ | ✓ | 2013 | reference corpus of newspapers and magazines from 2005--2009 | |
^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ year ^ characteristic features ^ | ^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ year ^ characteristic features ^ |
| **Parallel corpora** |||||| | | **Parallel corpora** |||||| |
| [[en:cnk:intercorp|InterCorp]] ([[en:cnk:intercorp:verze10|version 10]]) | 1.48G | (✓) | (✓) | 2008 | versioned parallel corpus being compiled as a part of the [[http://ucnk.ff.cuni.cz/intercorp/?lang=en|InterCorp project]] | | | [[en:cnk:intercorp|InterCorp]] ([[en:cnk:intercorp:verze11|version 11]]) | 1.7G | (✓) | (✓) | 2008 | versioned parallel corpus being compiled as a part of the [[http://ucnk.ff.cuni.cz/intercorp/?lang=en|InterCorp project]] | |
| **Comparable corpora** |||||| | | **Comparable corpora** |||||| |
| [[en:cnk:aranea|Aranea]] | 1G | ✓ | ✓ | 2014 | comparable web corpora for several European languages (cs, de, en, es, fi, fr, hu, it, nl, pl, pt, ru, sk, zh) | | | [[en:cnk:aranea|Aranea]] | 1G | ✓ | ✓ | 2014 | comparable web corpora for several European languages (cs, de, en, es, fi, fr, hu, it, nl, pl, pt, ru, sk, zh) | |