Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
en:cnk:nkjp [2018/11/02 18:19] – jazyková korektura lukes | en:cnk:nkjp [2018/11/06 10:29] – [Text classification] numbers adrianzasina | ||
---|---|---|---|
Line 23: | Line 23: | ||
^Communication layer^ doc.genre ^ Category ^ Proportion ^ | ^Communication layer^ doc.genre ^ Category ^ Proportion ^ | ||
- | | written | #typ_publ | journalism | 48,85 %| | + | | written | #typ_publ | journalism | 48.85%| |
- | | ::: | #typ_lit | fiction | 17,04 %| | + | | ::: | #typ_lit | fiction | 17.04%| |
- | | ::: | #typ_fakt | non-fiction | 5,34 %| | + | | ::: | #typ_fakt | non-fiction | 5.34%| |
- | | ::: | # | + | | ::: | # |
- | | ::: | #typ_urzed | legal texts | 2,97 %| | + | | ::: | #typ_urzed | legal texts | 2.97%| |
- | | ::: | #typ_nd | popular science texts | 1,91 %| | + | | ::: | #typ_nd | popular science texts | 1.91%| |
- | | ::: | #typ_nklas | non-fiction | 1,00 %| | + | | ::: | #typ_nklas | non-fiction |
- | | ::: | #typ_listy | correspondence| | + | | ::: | #typ_listy | correspondence| |
- | | ::: | # | + | | ::: | # |
- | | spoken | #typ_qmow | quasi-spoken texts | 2,50 %| | + | | spoken | #typ_qmow | quasi-spoken texts | 2.50%| |
- | | ::: | #typ_media | spoken media text | 2,07 %| | + | | ::: | #typ_media | spoken media text | 2.07%| |
- | | ::: | # | + | | ::: | # |
- | | web | # | + | | web | # |
- | | ::: | # | + | | ::: | # |
===== Positional annotation and tagging ===== | ===== Positional annotation and tagging ===== | ||
- | Compared to typical corpora of Czech, NKJP_1M additionally has a positional attribute which is specific for Polish, the so-called **flexeme**. It is a category which further subdivides parts of speech into more specific lexeme classes. For instance, within nouns (// | + | Compared to typical corpora of Czech, NKJP_1M additionally has a positional attribute which is specific for Polish, the so-called **flexeme**. It is a category which further subdivides parts of speech into more specific lexeme classes. For instance, within nouns (// |
Moreover, the Polish tagset differs from the Czech one; its detailed description (including the full flexeme list) is available [[http:// | Moreover, the Polish tagset differs from the Czech one; its detailed description (including the full flexeme list) is available [[http:// |