AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:intercorp:verze14 [2022/01/14 15:20] – [Texts in the corpus] alexandrrosenen:cnk:intercorp:verze14 [2024/04/18 16:00] (current) – [Morphosyntactic annotation] michalkren
Line 1: Line 1:
-~~NOTOC~~ 
 ====== InterCorp Release 14 ====== ====== InterCorp Release 14 ======
- 
-numbers: TODO! 
  
 ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^
-^ Positions ^ Number of tokens |  141,032,521 |  116,673,043 |  394,042,551 |  1,550,071,364 +^ Positions ^ Number of tokens |  145,640,866 |  116,673,038 |  418,967,492 |  1,548,425,287 
-^ ::: ^ Number of word forms |  113,838,505 |  89,819,773 |   327,968,369 |  1,223,270,610 +^ ::: ^ Number of word forms |  117,606,467 |  89,819,772 |   348,771,933 |  1,223,221,264 
-^ Structural attributes ^ Number of documents |  1,657 |  30 |  3,993 |   282 | +^ Structural attributes ^ Number of documents |  1,708 |  30 |  4,220 |   282 | 
-^ ::: ^ Number of texts |  1,657 |  111,951 |  3,993 |  1,843,528 | +^ ::: ^ Number of texts |  1,708 |  111,951 |  4,220 |  1,843,528 | 
-^ ::: ^ Number of sentences |  9,782,001 |  13,606,183 |  24,305,621 |  143,195,566 |+^ ::: ^ Number of sentences |  10,095,074 |  136,606,183 |  25,872,393 |  143,195,566 |
 ^ Further information ^ reference |  YES   ^^^^ ^ Further information ^ reference |  YES   ^^^^
 ^ ::: ^ representative |  NO  ^^^^ ^ ::: ^ representative |  NO  ^^^^
Line 52: Line 49:
 ^  Language  ^^  Core  ^  Syndicate  ^  Presseurop  ^  Acquis  ^  Europarl  ^  Subtitles  ^  Bible  ^  Total  ^ ^  Language  ^^  Core  ^  Syndicate  ^  Presseurop  ^  Acquis  ^  Europarl  ^  Subtitles  ^  Bible  ^  Total  ^
 ^  ar  ^ Arabic |  34 |  0 |  0 |  0 |  0 |  0 |  0 |  34 | ^  ar  ^ Arabic |  34 |  0 |  0 |  0 |  0 |  0 |  0 |  34 |
-^  be  ^ Belarusian |  5,718 |  0 |  0 |  0 |  0 |  0 |  0 |  5,718 +^  be  ^ Belarusian|  6 094 |  0 |  0 |  0 |  0 |  0 |  0 |  6 094 
-^  bg  ^ Bulgarian |  7,068 |  0 |  0 |  13,577 |  9,083 |  0 |  0 |  29,728 | +^  bg  ^ Bulgarian |  7 068 |  0 |  0 |  13 577 |  9 083 |  0 |  0 |  29 728 | 
-^  ca  ^ Catalan |  7,938 |  0 |  0 |  0 |  0 |  0 |  736 |  8,674 +^  ca  ^ Catalan |  8 920 |  0 |  0 |  0 |  0 |  0 |  736 |  9 656 
-^  da  ^ Danish |  7,136 |  0 |  0 |  20,313 |  13,916 |  14,429 |  657 |  56,451 +^  da  ^ Danish |  7 576 |  0 |  0 |  20 313 |  13 916 |  14 429 |  657 |  56 891 
-^  de  ^ German |  37,633 |  4,704 |  2,483 |  20,610 |  13,088 |  8,392 |  724 |  87,634 +^  de  ^ German |  38 475 |  4 704 |  2 483 |  20 610 |  13 088 |  8 392 |  724 |  88 476 
-^  el  ^ Greek |  0 |  0 |  0 |  23,853 |  15,404 |  23,709 |  0 |  62,966 | +^  el  ^ Greek |  0 |  0 |  0 |  23 853 |  15 404 |  23 709 |  0 |  62 966 | 
-^  en  ^ English |  33,569 |  4,856 |  2,670 |  22,902 |  15,576 |  52,106 |  730 |  132,409 +^  en  ^ English |  36 198 |  4 856 |  2 670 |  22 902 |  15 576 |  52 106 |  730 |  135 038 
-^  es  ^ Spanish |  26,554 |  5,614 |  2,859 |  26,262 |  16,249 |  36,650 |  0 |  114,187 +^  es  ^ Spanish |  28 115 |  5 614 |  2 859 |  26 262 |  16 249 |  36 650 |  0 |  115 748 
-^  et  ^ Estonian |  0 |  0 |  0 |  14,896 |  10,899 |  10,298 |  0 |  36,093 | +^  et  ^ Estonian |  0 |  0 |  0 |  14 896 |  10 899 |  10 298 |  0 |  36 093 | 
-^  fi  ^ Finnish |  5,656 |  0 |  0 |  15,269 |  10,108 |  15,047 |  543 |  46,622 +^  fi  ^ Finnish |  6 226 |  0 |  0 |  15 269 |  10 108 |  15 047 |  543 |  47 192 
-^  fr  ^ French |  19,773 |  5,600 |  3,046 |  26,200 |  17,179 |  25,986 |  764 |  98,547 +^  fr  ^ French |  21 279 |  5 600 |  3 046 |  26 200 |  17 179 |  25 986 |  764 |  100 054 
-^  he  ^ Hebrew |  0 |  0 |  0 |  0 |  0 |  16,221 |  0 |  16,221 |+^  he  ^ Hebrew |  0 |  0 |  0 |  0 |  0 |  16 221 |  0 |  16 221 |
 ^  hi  ^ Hindi |  409 |  0 |  0 |  0 |  0 |  0 |  0 |  409 | ^  hi  ^ Hindi |  409 |  0 |  0 |  0 |  0 |  0 |  0 |  409 |
-^  hr  ^ Croatian |  21,923 |  0 |  0 |  0 |  0 |  19,048 |  571 |  41,543 +^  hr  ^ Croatian |  22 736 |  0 |  0 |  0 |  0 |  19 048 |  571 |  42 356 | 
-^  hu  ^ Hungarian |  6,444 |  0 |  0 |  17,852 |  12,198 |  21,115 |  0 |  57,609 | +^  hs  ^ Upper Sorbian |  110 |  0 |  0 |  0 |  0 |  0 |  0 |  110 
-^  is  ^ Icelandic |  0 |  0 |  0 |  0 |  0 |  1,581 |  0 |  1,581 | +^  hu  ^ Hungarian |  6 444 |  0 |  0 |  17 852 |  12 198 |  21 115 |  0 |  57 609 | 
-^  it  ^ Italian |  14,525 |  1,252 |  2,747 |  23,771 |  15,494 |  14,700 |  684 |  73,174 +^  is  ^ Icelandic|  0 |  0 |  0 |  0 |  0 |  1 581 |  0 |  1 581 | 
-^  ja  ^ Japanese |  2,189 |  0 |  0 |  0 |  0 |  477 |  0 |  2,666 +^  it  ^ Italian |  15 741 |  1 252 |  2 747 |  23 771 |  15 494 |  14 700 |  684 |  74 389 
-^  lt  ^ Lithuanian |  421 |  0 |  0 |  17,316 |  11,213 |  558 |  471 |  29,979 +^  ja  ^ Japanese |  3 147 |  0 |  0 |  0 |  0 |  477 |  0 |  3 624 
-^  lv  ^ Latvian |  2,646 |  0 |  0 |  17,522 |  11,682 |  280 |  537 |  32,667 +^  lt  ^ Lithuanian|  502 |  0 |  0 |  17 316 |  11 213 |  558 |  471 |  30 059 
-^  mk  ^ Macedonian |  8,881 |  0 |  0 |  0 |  0 |  1,877 |  0 |  10,758 | +^  lv  ^ Latvian |  3 031 |  0 |  0 |  17 522 |  11 682 |  280 |  537 |  33 052 
-^  ms  ^ Malay |  0 |  0 |  0 |  0 |  0 |  3,521 |  0 |  3,521 | +^  mk  ^ Macedonian |  8 881 |  0 |  0 |  0 |  0 |  1 877 |  0 |  10 758 | 
-^  mt  ^ Maltese |  0 |  0 |  0 |  13,935 |  0 |  0 |  0 |  13,935 | +^  ms  ^ Malay |  0 |  0 |  0 |  0 |  0 |  3 521 |  0 |  3 521 | 
-^  nl  ^ Dutch |  16,216 |  813 |  2,953 |  23,416 |  15,558 |  29,373 |  717 |  89,045 +^  mt  ^ Maltese |  0 |  0 |  0 |  13 935 |  0 |  0 |  0 |  13 935 | 
-^  no  ^ Norwegian |  7,727 |  0 |  0 |  0 |  0 |  0 |  722 |  8,449 +^  nl  ^ Dutch |  16 691 |  813 |  2 953 |  23 416 |  15 558 |  29 373 |  717 |  89 520 
-^  pl  ^ Polish |  26,200 |  0 |  2,380 |  19,604 |  12,817 |  26,576 |  583 |  88,161 +^  no  ^ Norwegian |  7 818 |  0 |  0 |  0 |  0 |  0 |  722 |  8 540 
-^  pt  ^ Portuguese |  4,981 |  554 |  2,782 |  24,598 |  15,193 |  41,468 |  706 |  90,282 |+^  pl  ^ Polish |  27 669 |  0 |  2 380 |  19 604 |  12 817 |  26 576 |  583 |  89 630 
 +^  pt  ^ Portuguese |  6 245 |  554 |  2 782 |  24 598 |  15 193 |  41 468 |  706 |  91 546 |
 ^  rn  ^ Romani |  14 |  0 |  0 |  0 |  0 |  0 |  0 |  14 | ^  rn  ^ Romani |  14 |  0 |  0 |  0 |  0 |  0 |  0 |  14 |
-^  ro  ^ Romanian |  4,219 |  0 |  2,738 |  8,092 |  9,446 |  34,128 |  0 |  58,622 | +^  ro  ^ Romanian |  4 219 |  0 |  2 738 |  8 092 |  9 446 |  34 128 |  0 |  58 622 | 
-^  ru  ^ Russian |  8,642 |  3,984 |  0 |  0 |  0 |  6,887 |  565 |  20,078 +^  ru  ^ Russian |  10 510 |  3 984 |  0 |  0 |  0 |  6 887 |  565 |  21 946 
-^  sk  ^ Slovak |  8,543 |  0 |  0 |  18,399 |  12,727 |  5,133 |  561 |  45,363 | +^  sk  ^ Slovak |  8 543 |  0 |  0 |  18 399 |  12 727 |  5 133 |  561 |  45 363 | 
-^  sl  ^ Slovene |  3,871 |  0 |  0 |  18,528 |  12,251 |  17,061 |  0 |  51,711 +^  sl  ^ Slovene |  4 097 |  0 |  0 |  18 515 |  12 241 |  17 035 |  0 |  51 888 
-^  sq  ^ Albanian |  0 |  0 |  0 |  0 |  0 |  2,003 |  0 |  2,003 | +^  sq  ^ Albanian |  0 |  0 |  0 |  0 |  0 |  2 003 |  0 |  2 003 | 
-^  sr  ^ Serbian |  11,582 |  0 |  0 |  0 |  0 |  20,727 |  0 |  32,308 +^  sr  ^ Serbian |  12 014 |  0 |  0 |  0 |  0 |  20 727 |  0 |  32 741 
-^  sv  ^ Swedish |  15,790 |  0 |  0 |  19,542 |  13,784 |  14,666 |  638 |  64,419 +^  sv  ^ Swedish |  17 590 |  0 |  0 |  19 542 |  13 784 |  14 666 |  638 |  66 220 
-^  tr  ^ Turkish |  0 |  0 |  0 |  0 |  0 |  21,190 |  0 |  21,190 | +^  tr  ^ Turkish |  0 |  0 |  0 |  0 |  0 |  21 190 |  0 |  21 190 | 
-^  uk  ^ Ukrainian |  11,459 |  0 |  0 |  0 |  0 |  244 |  596 |  12,299 +^  uk  ^ Ukrainian |  12 172 |  0 |  0 |  0 |  0 |  244 |  596 |  13 011 
-^  vi  ^ Vietnamese |  0 |  0 |  0 |  0 |  0 |  1,474 |  0 |  1,474 | +^  vi  ^ Vietnamese |  0 |  0 |  0 |  0 |  0 |  1 474 |  0 |  1 474 | 
-^  zh  ^ Chinese |  127 |  240 |  0 |  0 |  0 |  2,247 |  0 |  2,614 +^  zh  ^ Chinese |  202 |  240 |  0 |  0 |  0 |  2 247 |  0 |  2 689 
-^ **Subtotal**  ^|  327,887 |  27,616 |  24,658 |  406,459 |  263,864 |  489,169 |  11,504 |  1,551,157 +^ **Subtotal** ^ |  348 770 |  27 617 |  24 658 |  406 444 |  263 855 |  489 146 |  11 505 |  1 571 991 
-^  cs  ^ Czech |  113,839 |  4,351 |  2,310 |  19,085 |  12,908 |  50,604 |  562 |  203,658 +^  cs  ^ Czech |  117 606 |  4 351 |  2 310 |  19 085 |  12 908 |  50 604 |  562 |  207 426 
-^ **TOTAL**  ^|  441,725 |  31,967 |  26,968 |  425,543 |  276,772 |  539,774 |  12,066 |  1,754,815 |+^ **TOTAL** ^ |  466 376 |  31 968 |  26 968 |  425 529 |  276 763 |  539 750 |  12 067 |  1 779 417 |
  
 N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart. N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart.
Line 100: Line 98:
 Texts in the following languages have received some morphosyntactic annotation. The format and often even the meaning of categories encoded in the morphosyntactic tags differs in most languages. Thus for each tagged language we provide a link to the tagset description. After selecting CQL as the query type, the tagset description is available also from the KonText search interface. Texts in the following languages have received some morphosyntactic annotation. The format and often even the meaning of categories encoded in the morphosyntactic tags differs in most languages. Thus for each tagged language we provide a link to the tagset description. After selecting CQL as the query type, the tagset description is available also from the KonText search interface.
  
-^  Language  ^  Tags  ^  Lemmas  ^  Brief description  ^  Detailed description  ^ Tags in the corpus ^ Tool  ^ +^  Language  ^  Tags  ^  Lemmas  ^  Brief description  ^  Detailed description  ^ Tool  ^ 
-^ Belarusian |  ✔  |   ✔    [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://universaldependencies.org/be/index.html#morphology|in English]]%%****%%)    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_be&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] +^ Belarusian |  ✔  |   ✔    [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://universaldependencies.org/be/index.html#morphology|in English]]%%****%%)  | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] 
-^ Bulgarian |  ✔  |   ✔    [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]]    [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/BTB-TR03_BulTreeBank_morphosyntactic_tag.pdf|in English]]  |   [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_bg&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Bulgarian |  ✔  |   ✔    [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]]    [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/BTB-TR03_BulTreeBank_morphosyntactic_tag.pdf|in English]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Catalan |  ✔  |  ✔  |  [[http://clic.ub.edu/corpus/webfm_send/18|in English]]  |     |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_ca&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Catalan |  ✔  |  ✔  |  [[http://clic.ub.edu/corpus/webfm_send/18|in English]]  |     | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Chinese |  ✔  |    |  [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]]  |  [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_zh&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]] +^ Chinese |  ✔  |    |  [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]]  |  [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]]  | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]] 
-^ Croatian |  ✔  |  ✔  |   [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]]  |  [[http://nlp.ffzg.hr/data/tagging/msd-hr.html|in English]]   |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_hr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]]   | +^ Croatian |  ✔  |  ✔  |   [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]]  |  [[http://nlp.ffzg.hr/data/tagging/msd-hr.html|in English]]   | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]]   | 
-^ Czech |  ✔  |  ✔  |  [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] |  [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_cs&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] +^ Czech |  ✔  |  ✔  |  [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] |  [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]]  | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] 
-^ Dutch |  ✔  |  ✔    |   [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]]  |   |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_nl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Dutch |  ✔  |  ✔    |   [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]]  |   | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ English |  ✔    ✔  |  [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]]  | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_en&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ English |  ✔    ✔  |  [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]]  | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Estonian |  ✔  |  ✔  |  [[http://www.cl.ut.ee/korpused/morfliides/seletus|in Estonian and English]]  |     |   [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_et&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Estonian |  ✔  |  ✔  |  [[http://www.cl.ut.ee/korpused/morfliides/seletus|in Estonian and English]]  |     | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Finnish |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/finntreebank|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%)   [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_fi&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] +^ Finnish |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/finntreebank|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%)  | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] 
-^ French |  ✔  |  ✔  |  [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html|in English]]  |     |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_fr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ French |  ✔  |  ✔  |  [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html|in English]]  |     |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ German |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]] %%**%%)  |  [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_de&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] +^ German |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]] %%**%%)  |  [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] 
-^ Hungarian |  ✔  |        |  [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_hu&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] +^ Hungarian |  ✔  |        |  [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] 
-^ Icelandic |  ✔  |  ✔  |  [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]]    [[http://nlp.cs.ru.is/pdf/Tagset.pdf|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_is&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] +^ Icelandic |  ✔  |  ✔  |  [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]]    [[http://nlp.cs.ru.is/pdf/Tagset.pdf|in English]]  | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] 
-^ Italian |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]]       |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_it&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Italian |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]]       |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Japanese |  ✔  |  ✔  |  [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]]       |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_ja&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] +^ Japanese |  ✔  |  ✔  |  [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]]       | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] 
-^ Latvian |  ✔  |  ✔  |   [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]]  |     |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_lv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] +^ Latvian |  ✔  |  ✔  |   [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]]  |     | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] 
-^ Norwegian |  ✔  |  ✔  |  [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]]  |    |    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_no&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/noklesta/The-Oslo-Bergen-Tagger|Oslo-Bergen Tagger]] +^ Norwegian |  ✔  |  ✔  |  [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]]  |    | [[https://github.com/noklesta/The-Oslo-Bergen-Tagger|Oslo-Bergen Tagger]] 
-^ Polish |  ✔  |  ✔  |  [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]]  |  [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_pl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://sgjp.pl/morfeusz/|Morfeusz]], [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] +^ Polish |  ✔  |  ✔  |  [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]]  |  [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]]  |[[http://sgjp.pl/morfeusz/|Morfeusz]], [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] 
-^ Portuguese |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]]  |     |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_pt&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Portuguese |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]]  |     | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Russian |  ✔  |  ✔  |  [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] %%***%%)   [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_ru&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Russian |  ✔  |  ✔  |  [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] %%***%%)  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Slovak |  ✔  |  ✔  |  [[http://korpus.sk/morpho.html/|in Slovak]] and [[https://korpus.sk/morpho_en.html/|English]]  |  [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] +^ Slovak |  ✔  |  ✔  |  [[http://korpus.sk/morpho.html/|in Slovak]] and [[https://korpus.sk/morpho_en.html/|English]]  |  [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]]  | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] 
-^ Slovene |  ✔  |  ✔  |    [[http://nl.ijs.si/jos/msd/html-en/josMSD-en.html|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] +^ Slovene |  ✔  |  ✔  |    [[http://nl.ijs.si/jos/msd/html-en/josMSD-en.html|in English]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] 
-^ Serbian |  ✔  |  ✔  |  [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]]  |   [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]]   |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]]   | +^ Serbian |  ✔  |  ✔  |  [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]]  |   [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]]   | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]]   | 
-^ Spanish |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]]  |     |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_es&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Spanish |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]]  |     | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Swedish |  ✔  |  ✔  |  [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]]       |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] +^ Swedish |  ✔  |  ✔  |  [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]]       | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] 
-^ Ukrainian |  ✔  |  ✔  |  |  [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)   [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_uk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]]  |+^ Ukrainian |  ✔  |  ✔  |  |  [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]]  |
  
  
Line 144: Line 142:
  
 Morphological tags including characters with a special meaning in regular expressions, e.g. "%%$%%" in the English tag "wp%%$%%", must be preceded in queries by a backslash: tag="wp\$". Morphological tags including characters with a special meaning in regular expressions, e.g. "%%$%%" in the English tag "wp%%$%%", must be preceded in queries by a backslash: tag="wp\$".
-====Structural attributes====+=====Structural attributes=====
  
 ^Structure^Attribute^Description^Values^ ^Structure^Attribute^Description^Values^
Line 235: Line 233:
 When citing a specific part of InterCorp please use the reference displayed in KonText in the corpus description, e.g. as: When citing a specific part of InterCorp please use the reference displayed in KonText in the corpus description, e.g. as:
  
-Rosen, A., Vavřín, M., Zasina, A. J. (2022). //The InterCorp Corpus – Czech((Insert languages actually used.)), version 14 of 17 January 2022//. Institute of the Czech National Corpus, Charles University, Prague 2020. Available on-line: https://kontext.korpus.cz/+Rosen, A., Vavřín, M., Zasina, A. J. (2022). //The InterCorp Corpus – Czech((Insert languages actually used.)), version 14 of 31 January 2022//. Institute of the Czech National Corpus, Charles University, Prague 2022. Available on-line: https://kontext.korpus.cz/
  
 </WRAP> </WRAP>
Line 242: Line 240:
  
 <WRAP round box 51%> <WRAP round box 51%>
-[[en:cnk:intercorp|InterCorp]] • [[en:cnk:intercorp:verze12|Version 12]] • [[en:cnk:intercorp:verze11|Version 11]]  • [[en:cnk:intercorp:verze10|Version 10]] • [[en:cnk:intercorp:verze9|Version 9]] • [[en:cnk:intercorp:verze8|Version 8]] • [[en:cnk:intercorp:verze7|Version 7]] • [[en:cnk:intercorp:verze6|Version 6]] • [[en:cnk:intercorp:verze5|Version 5]] • [[en:cnk:intercorp:verze4|Verze 4]] • [[en:cnk:intercorp:verze3|Version 3]] • [[en:cnk:intercorp:historie|Version history]]+[[en:cnk:intercorp|InterCorp]] • [[en:cnk:intercorp:verze13ud|Version 13ud]] • [[en:cnk:intercorp:verze13|Version 13]] • [[en:cnk:intercorp:verze12|Version 12]] • [[en:cnk:intercorp:verze11|Version 11]]  • [[en:cnk:intercorp:verze10|Version 10]] • [[en:cnk:intercorp:verze9|Version 9]] • [[en:cnk:intercorp:verze8|Version 8]] • [[en:cnk:intercorp:verze7|Version 7]] • [[en:cnk:intercorp:verze6|Version 6]] • [[en:cnk:intercorp:verze5|Version 5]] • [[en:cnk:intercorp:verze4|Verze 4]] • [[en:cnk:intercorp:verze3|Version 3]] • [[en:cnk:intercorp:historie|Version history]]
  
 See [[https://intercorp.korpus.cz/?lang=en|the original InterCorp site in English]]. See [[https://intercorp.korpus.cz/?lang=en|the original InterCorp site in English]].
 </WRAP> </WRAP>