Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision |
en:cnk:uvod [2018/11/12 15:33] – [Corpora of the Czech National Corpus project] vaclavcvrcek | en:cnk:uvod [2019/10/31 19:24] – alexandrrosen |
---|
^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ released((For versioned corpora (e.g. [[en:cnk:syn|SYN]] or [[en:cnk:intercorp|InterCorp]]), the year when the first version was released is stated.)) ^ characteristic features ^ | ^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ released((For versioned corpora (e.g. [[en:cnk:syn|SYN]] or [[en:cnk:intercorp|InterCorp]]), the year when the first version was released is stated.)) ^ characteristic features ^ |
| **General corpora** |||||| | | **General corpora** |||||| |
| [[en:cnk:syn|SYN]] ([[en:cnk:syn:verze6|version 6]]) | 4.033G | ✓ | ✓ | 2010 | versioned corpus, unification of all the SYN-series synchronic written corpora | | | [[en:cnk:syn|SYN]] ([[en:cnk:syn:verze7|version 7]]) | 4.255G | ✓ | ✓ | 2010 | versioned corpus, unification of all the SYN-series synchronic written corpora | |
| [[en:cnk:syn2015|SYN2015]] | 100M | ✓ | ✓ | 2015 | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts | | | [[en:cnk:syn2015|SYN2015]] | 100M | ✓ | ✓ | 2015 | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts | |
| [[en:cnk:syn2013PUB|SYN2013PUB]] | 935M | ✓ | ✓ | 2013 | reference corpus of newspapers and magazines from 2005--2009 | | | [[en:cnk:syn2013PUB|SYN2013PUB]] | 935M | ✓ | ✓ | 2013 | reference corpus of newspapers and magazines from 2005--2009 | |
| [[en:cnk:czesl-plain|CZESL-PLAIN]] | 2M | ✗ | ✗ | 2012 | non-reference learner corpus of non-native Czech speakers | | | [[en:cnk:czesl-plain|CZESL-PLAIN]] | 2M | ✗ | ✗ | 2012 | non-reference learner corpus of non-native Czech speakers | |
| [[en:cnk:czesl-sgt|CZESL-SGT]] | 960k | ✓ | ✓ | 2014 | non-reference learner corpus of non-native speakers’ Czech with automatic annotation | | | [[en:cnk:czesl-sgt|CZESL-SGT]] | 960k | ✓ | ✓ | 2014 | non-reference learner corpus of non-native speakers’ Czech with automatic annotation | |
| | [[en:cnk:czesl-sgt-basic|CZESL-SGT-BASIC]] | 960k | ✓ | ✓ | 2019 | same as CZESL-SGT except for a reduced set of metadata in the **Restrict search** section of the search interface | |
| [[en:cnk:fictree|FicTree]] | 135k | ✓ | ✓ | 2017 | manually annotated treebank of Czech fiction | | | [[en:cnk:fictree|FicTree]] | 135k | ✓ | ✓ | 2017 | manually annotated treebank of Czech fiction | |
| [[en:cnk:fsc2000|FSC2000]] | 100M | ✓ | ✗ | 2004 | modified [[en:cnk:syn2000|SYN2000]], source of the Frequency Dictionary of Czech | | | [[en:cnk:fsc2000|FSC2000]] | 100M | ✓ | ✗ | 2004 | modified [[en:cnk:syn2000|SYN2000]], source of the Frequency Dictionary of Czech | |