Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision |
en:cnk:uvod [2018/12/20 12:58] – [Corpora of the Czech National Corpus project] michalskrabal | en:cnk:uvod [2019/11/07 17:09] – [Corpora of the Czech National Corpus project] michalkren |
---|
| [[en:cnk:czesl-plain|CZESL-PLAIN]] | 2M | ✗ | ✗ | 2012 | non-reference learner corpus of non-native Czech speakers | | | [[en:cnk:czesl-plain|CZESL-PLAIN]] | 2M | ✗ | ✗ | 2012 | non-reference learner corpus of non-native Czech speakers | |
| [[en:cnk:czesl-sgt|CZESL-SGT]] | 960k | ✓ | ✓ | 2014 | non-reference learner corpus of non-native speakers’ Czech with automatic annotation | | | [[en:cnk:czesl-sgt|CZESL-SGT]] | 960k | ✓ | ✓ | 2014 | non-reference learner corpus of non-native speakers’ Czech with automatic annotation | |
| | [[en:cnk:czesl-sgt-basic|CZESL-SGT-BASIC]] | 960k | ✓ | ✓ | 2019 | CZESL-SGT with a reduced set of metadata in the Restrict search section of the search interface | |
| [[en:cnk:fictree|FicTree]] | 135k | ✓ | ✓ | 2017 | manually annotated treebank of Czech fiction | | | [[en:cnk:fictree|FicTree]] | 135k | ✓ | ✓ | 2017 | manually annotated treebank of Czech fiction | |
| [[en:cnk:fsc2000|FSC2000]] | 100M | ✓ | ✗ | 2004 | modified [[en:cnk:syn2000|SYN2000]], source of the Frequency Dictionary of Czech | | | [[en:cnk:fsc2000|FSC2000]] | 100M | ✓ | ✗ | 2004 | modified [[en:cnk:syn2000|SYN2000]], source of the Frequency Dictionary of Czech | |
| [[en:cnk:intercorp|InterCorp]] ([[en:cnk:intercorp:verze11|version 11]]) | 1.7G | (✓) | (✓) | 2008 | versioned parallel corpus being compiled as a part of the [[http://ucnk.ff.cuni.cz/intercorp/?lang=en|InterCorp project]] | | | [[en:cnk:intercorp|InterCorp]] ([[en:cnk:intercorp:verze11|version 11]]) | 1.7G | (✓) | (✓) | 2008 | versioned parallel corpus being compiled as a part of the [[http://ucnk.ff.cuni.cz/intercorp/?lang=en|InterCorp project]] | |
| **Comparable corpora** |||||| | | **Comparable corpora** |||||| |
| [[en:cnk:aranea|Aranea]] | 1G | ✓ | ✓ | 2014 | comparable web corpora for several European languages (cs, de, en, es, fi, fr, hu, it, nl, pl, pt, ru, sk, zh) | | | [[en:cnk:aranea|Aranea]] | 1G | ✓ | ✓ | 2014 | comparable web corpora for several languages (cs, de, en, es, fi, fr, hu, it, nl, pl, pt, ru, sk, zh) | |
| [[en:cnk:dewac|deWaC]] | 1.35G | ✓ | ✓ | 2013 | web corpus of German | | | [[en:cnk:dewac|deWaC]] | 1.35G | ✓ | ✓ | 2013 | web corpus of German | |
| [[en:cnk:frwac|frWaC]] | 1.35G | ✓ | ✓ | 2013 | web corpus of French | | | [[en:cnk:frwac|frWaC]] | 1.35G | ✓ | ✓ | 2013 | web corpus of French | |