AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
en:cnk:nkjp [2018/11/05 12:22] – [Positional annotation and tagging] adrianzasinaen:cnk:nkjp [2018/11/06 10:33] – [Corpus NKJP_1M] numbers adrianzasina
Line 6: Line 6:
 <WRAP right 35%> <WRAP right 35%>
 ^ <fs medium>Name</fs> ^^ <fs medium>NKJP_1M</fs> ^ ^ <fs medium>Name</fs> ^^ <fs medium>NKJP_1M</fs> ^
-^ Positions ^ Number of positions (tokens) |  1 215 513 |   +^ Positions ^ Number of positions (tokens) |  1,215,513 |   
-^ ::: ^ Number of positions (excl. punctuation) |  992 014 |   +^ ::: ^ Number of positions (excl. punctuation) |  992,014 |   
-^ ::: ^ Number of word forms |  143 477 |   +^ ::: ^ Number of word forms |  143,477 |   
-^ ::: ^ Number of lemmas |  54 174 | +^ ::: ^ Number of lemmas |  54,174 | 
-^ Structures ^ Number of documents <doc> |  3 889 | +^ Structures ^ Number of documents <doc> |  3,889 | 
-^ ::: ^ Number of paragraphs <p> |  18 484 | +^ ::: ^ Number of paragraphs <p> |  18,484 | 
-^ ::: ^ Number of sentences <s> |  85 663 |+^ ::: ^ Number of sentences <s> |  85,663 |
 ^ Further information ^ Reference corpus |  YES |   ^ Further information ^ Reference corpus |  YES |  
 ^ ::: ^ Representative corpus |  YES | ^ ::: ^ Representative corpus |  YES |
Line 23: Line 23:
  
 ^Communication layer^ doc.genre ^ Category ^ Proportion ^ ^Communication layer^ doc.genre ^ Category ^ Proportion ^
-| written | #typ_publ | journalism |  48,85 %| +| written | #typ_publ | journalism |  48.85%| 
-| ::: | #typ_lit | fiction |  17,04 %| +| ::: | #typ_lit | fiction |  17.04%| 
-| ::: | #typ_fakt | non-fiction |  5,34 %| +| ::: | #typ_fakt | non-fiction |  5.34%| 
-| ::: | #typ_inf-por | informative texts |  5,62 %| +| ::: | #typ_inf-por | informative texts |  5.62%| 
-| ::: | #typ_urzed | legal texts |  2,97 %| +| ::: | #typ_urzed | legal texts |  2.97%| 
-| ::: | #typ_nd | popular science texts |  1,91 %| +| ::: | #typ_nd | popular science texts |  1.91%| 
-| ::: | #typ_nklas | non-fiction unclassified book |  1,00 %| +| ::: | #typ_nklas | non-fiction unclassified book |  1.00%| 
-| ::: | #typ_listy | correspondence|  0,04 %| +| ::: | #typ_listy | correspondence|  0.04%| 
-| ::: | #typ_lit_poezja | poetry |  0,01 %| +| ::: | #typ_lit_poezja | poetry |  0.01%| 
-| spoken | #typ_qmow | quasi-spoken texts |  2,50 %| +| spoken | #typ_qmow | quasi-spoken texts |  2.50%| 
-| ::: | #typ_media | spoken media text |  2,07 %| +| ::: | #typ_media | spoken media text |  2.07%| 
-| ::: | #typ_konwers | spoken conversational texts |  5,57 %| +| ::: | #typ_konwers | spoken conversational texts |  5.57%| 
-| web | #typ_net_interakt | interaction-based Internet texts |  5,18 %| +| web | #typ_net_interakt | interaction-based Internet texts |  5.18%| 
-| ::: | #typ_net_nieinterakt | non-interaction-based Internet texts |  1,91 %|+| ::: | #typ_net_nieinterakt | non-interaction-based Internet texts |  1.91%|
  
 ===== Positional annotation and tagging ===== ===== Positional annotation and tagging =====