Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
en:cnk:koditex [2018/06/05 11:23] – [The Koditex Corpus] petrapoukarova | en:cnk:koditex [2018/06/05 11:28] – [Chunks] petrapoukarova | ||
---|---|---|---|
Line 19: | Line 19: | ||
</ | </ | ||
- | When compiling the corpus, the primary goal was for it to be as diverse and representative as possible, reflecting the variability of Czech in all of its modes and ranges of use (written, spoken, online communication) and featuring rich annotation (the texts were [[en: | + | When compiling the corpus, the primary goal was for it to be as diverse and representative as possible, reflecting the variability of Czech in all of its modes and ranges of use (written, spoken, online communication) and featuring rich annotation (the texts were [[en: |
The name //Koditex// is both an acronym of the Czech version of the phrase // | The name //Koditex// is both an acronym of the Czech version of the phrase // | ||
Line 105: | Line 105: | ||
The majority of texts (accounting for 76% of tokens) included in the corpus are Czech originals (not translations from other languages). The only exceptions are text classes where translated material is common in Czech in general, listed in the table below (the rest of the classes are 100% Czech originals). | The majority of texts (accounting for 76% of tokens) included in the corpus are Czech originals (not translations from other languages). The only exceptions are text classes where translated material is common in Czech in general, listed in the table below (the rest of the classes are 100% Czech originals). | ||
- | ^ Class ^ Translations (words) ^ Originals (words) ^ % Translations | + | ^ Class ^ Translations (words) ^ Originals (words) ^ % translations |
| LOV | 210,250 | 30,981 | 87.2% | | | LOV | 210,250 | 30,981 | 87.2% | | ||
| CRM | 202,921 | 37,677 | 84.3% | | | CRM | 202,921 | 37,677 | 84.3% | |