AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:nkjp [2018/11/06 10:29] – [Text classification] numbers adrianzasinaen:cnk:nkjp [2018/11/12 16:09] (current) – [Corpus NKJP_1M] michalkren
Line 1: Line 1:
 ~~NOTOC~~ ~~NOTOC~~
-====== Corpus NKJP_1M ======+====== The NKJP_1M corpus ======
  
 The NKJP_1M corpus is a manually annotated one million word subcorpus of the [[http://nkjp.pl| National Corpus of Polish]] (NKJP – //Narodowy Korpus Języka Polskiego//), composed of various text samples (see below). It is a corpus of contemporary Polish with texts published after the year 1945; it contains written, spoken and web communication. The corpus features lemmatisation, morphological annotation, and representative coverage of text categories. The NKJP_1M corpus is a manually annotated one million word subcorpus of the [[http://nkjp.pl| National Corpus of Polish]] (NKJP – //Narodowy Korpus Języka Polskiego//), composed of various text samples (see below). It is a corpus of contemporary Polish with texts published after the year 1945; it contains written, spoken and web communication. The corpus features lemmatisation, morphological annotation, and representative coverage of text categories.
Line 6: Line 6:
 <WRAP right 35%> <WRAP right 35%>
 ^ <fs medium>Name</fs> ^^ <fs medium>NKJP_1M</fs> ^ ^ <fs medium>Name</fs> ^^ <fs medium>NKJP_1M</fs> ^
-^ Positions ^ Number of positions (tokens) |  1 215 513 |   +^ Positions ^ Number of positions (tokens) |  1,215,513 |   
-^ ::: ^ Number of positions (excl. punctuation) |  992 014 |   +^ ::: ^ Number of positions (excl. punctuation) |  992,014 |   
-^ ::: ^ Number of word forms |  143 477 |   +^ ::: ^ Number of word forms |  143,477 |   
-^ ::: ^ Number of lemmas |  54 174 | +^ ::: ^ Number of lemmas |  54,174 | 
-^ Structures ^ Number of documents <doc> |  3 889 | +^ Structures ^ Number of documents <doc> |  3,889 | 
-^ ::: ^ Number of paragraphs <p> |  18 484 | +^ ::: ^ Number of paragraphs <p> |  18,484 | 
-^ ::: ^ Number of sentences <s> |  85 663 |+^ ::: ^ Number of sentences <s> |  85,663 |
 ^ Further information ^ Reference corpus |  YES |   ^ Further information ^ Reference corpus |  YES |  
 ^ ::: ^ Representative corpus |  YES | ^ ::: ^ Representative corpus |  YES |