Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:cnk:nkjp [2018/11/06 10:29] – [Text classification] numbers adrianzasina | en:cnk:nkjp [2018/11/12 16:09] (current) – [Corpus NKJP_1M] michalkren | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ~~NOTOC~~ | ~~NOTOC~~ | ||
| - | ====== | + | ====== |
| The NKJP_1M corpus is a manually annotated one million word subcorpus of the [[http:// | The NKJP_1M corpus is a manually annotated one million word subcorpus of the [[http:// | ||
| Line 6: | Line 6: | ||
| <WRAP right 35%> | <WRAP right 35%> | ||
| ^ <fs medium> | ^ <fs medium> | ||
| - | ^ Positions ^ Number of positions (tokens) | 1 215 513 | | + | ^ Positions ^ Number of positions (tokens) | 1,215,513 | |
| - | ^ ::: ^ Number of positions (excl. punctuation) | 992 014 | | + | ^ ::: ^ Number of positions (excl. punctuation) | 992,014 | |
| - | ^ ::: ^ Number of word forms | 143 477 | | + | ^ ::: ^ Number of word forms | 143,477 | |
| - | ^ ::: ^ Number of lemmas | 54 174 | | + | ^ ::: ^ Number of lemmas | 54,174 | |
| - | ^ Structures ^ Number of documents <doc> | 3 889 | | + | ^ Structures ^ Number of documents <doc> | 3,889 | |
| - | ^ ::: ^ Number of paragraphs <p> | 18 484 | | + | ^ ::: ^ Number of paragraphs <p> | 18,484 | |
| - | ^ ::: ^ Number of sentences <s> | 85 663 | | + | ^ ::: ^ Number of sentences <s> | 85,663 | |
| ^ Further information ^ Reference corpus | YES | | ^ Further information ^ Reference corpus | YES | | ||
| ^ ::: ^ Representative corpus | YES | | ^ ::: ^ Representative corpus | YES | | ||