AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
en:cnk:nkjp [2018/11/02 18:19] – jazyková korektura lukesen:cnk:nkjp [2018/11/06 10:29] – [Text classification] numbers adrianzasina
Line 23: Line 23:
  
 ^Communication layer^ doc.genre ^ Category ^ Proportion ^ ^Communication layer^ doc.genre ^ Category ^ Proportion ^
-| written | #typ_publ | journalism |  48,85 %| +| written | #typ_publ | journalism |  48.85%| 
-| ::: | #typ_lit | fiction |  17,04 %| +| ::: | #typ_lit | fiction |  17.04%| 
-| ::: | #typ_fakt | non-fiction |  5,34 %| +| ::: | #typ_fakt | non-fiction |  5.34%| 
-| ::: | #typ_inf-por | informative texts |  5,62 %| +| ::: | #typ_inf-por | informative texts |  5.62%| 
-| ::: | #typ_urzed | legal texts |  2,97 %| +| ::: | #typ_urzed | legal texts |  2.97%| 
-| ::: | #typ_nd | popular science texts |  1,91 %| +| ::: | #typ_nd | popular science texts |  1.91%| 
-| ::: | #typ_nklas | non-fiction |  1,00 %| +| ::: | #typ_nklas | non-fiction unclassified book |  1.00%| 
-| ::: | #typ_listy | correspondence|  0,04 %| +| ::: | #typ_listy | correspondence|  0.04%| 
-| ::: | #typ_lit_poezja | poetry |  0,01 %| +| ::: | #typ_lit_poezja | poetry |  0.01%| 
-| spoken | #typ_qmow | quasi-spoken texts |  2,50 %| +| spoken | #typ_qmow | quasi-spoken texts |  2.50%| 
-| ::: | #typ_media | spoken media text |  2,07 %| +| ::: | #typ_media | spoken media text |  2.07%| 
-| ::: | #typ_konwers | spoken conversational texts |  5,57 %| +| ::: | #typ_konwers | spoken conversational texts |  5.57%| 
-| web | #typ_net_interakt | interaction-based Internet texts |  5,18 %| +| web | #typ_net_interakt | interaction-based Internet texts |  5.18%| 
-| ::: | #typ_net_nieinterakt | non-interaction-based Internet texts |  1,91 %|+| ::: | #typ_net_nieinterakt | non-interaction-based Internet texts |  1.91%|
  
 ===== Positional annotation and tagging ===== ===== Positional annotation and tagging =====
  
-Compared to typical corpora of Czech, NKJP_1M additionally has a positional attribute which is specific for Polish, the so-called **flexeme**. It is a category which further subdivides parts of speech into more specific lexeme classes. For instance, within nouns (//subst//), depreciative nouns (//depr//) form one of the flexeme subgroups; flexemes also distinguish between regular adjectives (//adj//), compound adjectives (//adja//, e.g. //__biało__-czerwony//, //__sportowo__-rekreacyjny//), post-prepositional adjectives (//adjp//, e.g. //po __polsku__//, //od __dawna__//), and predicative adjectives (//adjc//, e.g. //jestem __pewien__//, //był __wesół__ i __zdrów__//); and there is a particularly fine-grained subcategorization of verbs (more than 10 different flexemes). +Compared to typical corpora of Czech, NKJP_1M additionally has a positional attribute which is specific for Polish, the so-called **flexeme**. It is a category which further subdivides parts of speech into more specific lexeme classes. For instance, within nouns (//subst//), depreciative nouns (//depr//) form one of the flexeme subgroups; flexemes also distinguish between regular adjectives (//adj//), the first part of compound adjectives (//adja//, e.g. //__biało__-czerwony//, //__sportowo__-rekreacyjny//), post-prepositional adjectives (//adjp//, e.g. //po __polsku__//, //od __dawna__//), and predicative adjectives (//adjc//, e.g. //jestem __pewien__//, //był __wesół__ i __zdrów__//); and there is a particularly fine-grained subcategorization of verbs (more than 10 different flexemes). 
  
 Moreover, the Polish tagset differs from the Czech one; its detailed description (including the full flexeme list) is available [[http://nkjp.pl/poliqarp/help/ense2.html|here]]. Moreover, the Polish tagset differs from the Czech one; its detailed description (including the full flexeme list) is available [[http://nkjp.pl/poliqarp/help/ense2.html|here]].