Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Next revisionBoth sides next revision | ||
en:cnk:koditex [2018/06/05 11:25] – [The Koditex Corpus] Petra Poukarová | en:cnk:koditex [2018/06/05 11:28] – [Chunks] Petra Poukarová | ||
---|---|---|---|
Line 105: | Line 105: | ||
The majority of texts (accounting for 76% of tokens) included in the corpus are Czech originals (not translations from other languages). The only exceptions are text classes where translated material is common in Czech in general, listed in the table below (the rest of the classes are 100% Czech originals). | The majority of texts (accounting for 76% of tokens) included in the corpus are Czech originals (not translations from other languages). The only exceptions are text classes where translated material is common in Czech in general, listed in the table below (the rest of the classes are 100% Czech originals). | ||
- | ^ Class ^ Translations (words) ^ Originals (words) ^ % Translations | + | ^ Class ^ Translations (words) ^ Originals (words) ^ % translations |
| LOV | 210,250 | 30,981 | 87.2% | | | LOV | 210,250 | 30,981 | 87.2% | | ||
| CRM | 202,921 | 37,677 | 84.3% | | | CRM | 202,921 | 37,677 | 84.3% | |