AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
en:cnk:intercorp:verze11 [2019/10/06 20:43] – [Taggers/lemmatizers:] michalskrabalen:cnk:intercorp:verze11 [2019/12/20 00:22] – [InterCorp Release 12] alexandrrosen
Line 1: Line 1:
 ~~NOTOC~~ ~~NOTOC~~
-====== InterCorp Release 11 ======+====== InterCorp Release 12 ======
  
 ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^
-^ Positions ^ Number of tokens |   132,508,429 |  115,574,528 |  340,554,768 |  1,550,923,096 +^ Positions ^ Number of tokens |  137 059 021 |  116 673 027 |  373 873 819 |  1 549 570 665 
-^ ::: ^ Number of word forms |   106,898,538 |  88,872,779 |  283,075,338 |  1,225,361,750 +^ ::: ^ Number of word forms |   110 588 784 |  89 819 765 |  310 914 295 |  1 222 868 666 
-^ Structural attributes ^ Number of documents |  1,564 |  28 |  3,494 |   261 +^ Structural attributes ^ Number of documents |  1 619 |  30 |  3 806 |   281 
-^ ::: ^ Number of texts |   1,507 |  111,672 |  3,232 |  1,841,341 +^ ::: ^ Number of texts |  619 |  111 951 |  3 806 |  1 843 489 
-^ ::: ^ Number of sentences |  9,193,433 |  13,556,382 |  21,000,997 |  142,734,659 |+^ ::: ^ Number of sentences |  9 518 229 |  13 606 183 |  23 076 128 |  143 165 959 |
 ^ Further information ^ reference |  YES   ^^^^ ^ Further information ^ reference |  YES   ^^^^
 ^ ::: ^ representative |  NO  ^^^^ ^ ::: ^ representative |  NO  ^^^^
-^ ::: ^ publication date |  2018  ^^^^ +^ ::: ^ publication date |  2019  ^^^^ 
-^ ::: ^ foreign languages |  39  ^^^^+^ ::: ^ foreign languages |  40  ^^^^
 ^ ::: ^ tagged languages |  26  ^^^^ ^ ::: ^ tagged languages |  26  ^^^^
 ^ ::: ^ lemmatized languages |  25  ^^^^ ^ ::: ^ lemmatized languages |  25  ^^^^
Line 54: Line 54:
  
  
-[{{:cnk:intercorp:intercorp_wordcounts_v11.png|Setup of the parallel corpus – the core and collections}}]+[{{:cnk:intercorp:intercorp_wordcounts_v12.png|Setup of the parallel corpus – the core and collections}}]
  
-[{{:cnk:intercorp:intercorp_wordcounts2_v11.png|Setup of the parallel corpus – the core}}]+[{{:cnk:intercorp:intercorp_wordcounts2_v12.png|Setup of the parallel corpus – the core}}]
  
-[{{:cnk:intercorp:intercorp_wordcounts3_v11.png|Setup of the parallel corpus – collections}}]+[{{:cnk:intercorp:intercorp_wordcounts3_v12.png|Setup of the parallel corpus – collections}}]
  
 ===== Corpus size in thousands of words ===== ===== Corpus size in thousands of words =====
  
 ^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Bible ^ Total ^ ^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Bible ^ Total ^
-| ar | Arabic |  34 |  0 |  0 |  0 |  0 |  0 |  0 |  34 |   + ar  | Arabic |  34 |  0 |  0 |  0 |  0 |  0 |  0 |  34 | 
-| be | Belarusian |  4,426 |  0 |  0 |  0 |  0 |  0 |  0 |  4,426   + be  | Belarusian |  5 319 |  0 |  0 |  0 |  0 |  0 |  0 |  5 319 
-| bg | Bulgarian |  6,780 |  0 |  0 |  13,577 |  9,083 |  0 |  0 |  29,441 + bg  | Bulgarian |  7 068 |  0 |  0 |  13 577 |  9 083 |  0 |  0 |  29 728 
-| ca | Catalan |  5,596 |  0 |  0 |  0 |  0 |  0 |  736 |  6,332 + ca  | Catalan |  7 481 |  0 |  0 |  0 |  0 |  0 |  736 |  8 217 
-| da | Danish |  5,595 |  0 |  0 |  20,313 |  13,916 |  14,429 |  657 |  54,910 + da  | Danish |  6 654 |  0 |  0 |  20 313 |  13 916 |  14 429 |  657 |  55 968 
-| de | German |  34,915 |  4,457 |  2,483 |  20,610 |  13,088 |  8,392 |  724 |  84,669 + de  | German |  36 373 |  4 704 |  2 483 |  20 610 |  13 088 |  8 392 |  724 |  86 374 
-| el | Greek |  0 |  0 |  0 |  23,853 |  15,404 |  23,709 |  0 |  62,966 | + el  | Greek |  0 |  0 |  0 |  23 853 |  15 404 |  23 709 |  0 |  62 966 | 
-| en | English |  27,968 |  4,604 |  2,670 |  22,902 |  15,576 |  52,105 |  730 |  126,555 + en  | English |  32 152 |  4 856 |  2 670 |  22 902 |  15 576 |  52 105 |  730 |  130 992 
-| es | Spanish |  23,349 |  5,322 |  2,859 |  26,262 |  16,249 |  36,650 |  0 |  110,691 + es  | Spanish |  25 595 |  5 614 |  2 859 |  26 262 |  16 249 |  36 650 |  0 |  113 228 
-| et | Estonian |  0 |  0 |  0 |  14,896 |  10,899 |  10,298 |  0 |  36,093 | + et  | Estonian |  0 |  0 |  0 |  14 896 |  10 899 |  10 298 |  0 |  36 093 | 
-| fi | Finnish |  4,585 |  0 |  0 |  15,489 |  10,175 |  15,098 |  544 |  45,890 + fi  | Finnish |  5 329 |  0 |  0 |  15 269 |  10 108 |  15 047 |  543 |  46 296 
-| fr | French |  17,213 |  5,391 |  3,046 |  26,200 |  17,179 |  25,986 |  764 |  95,779 + fr  | French |  18 241 |  5 600 |  3 046 |  26 200 |  17 179 |  25 986 |  764 |  97 016 
-| he | Hebrew |  0 |  0 |  0 |  0 |  0 |  16,221 |  0 |  16,221 | + he  | Hebrew |  0 |  0 |  0 |  0 |  0 |  16 221 |  0 |  16 221 | 
-| hi | Hindu |  409 |  0 |  0 |  0 |  0 |  0 |  0 |  409 | + hi  Hindi |  409 |  0 |  0 |  0 |  0 |  0 |  0 |  409 | 
-| hr | Croatian |  20,147 |  0 |  0 |  0 |  0 |  19,048 |  571 |  39,767 + hr  | Croatian |  21 027 |  0 |  0 |  0 |  0 |  19 048 |  571 |  40 646 
-| hu | Hungarian |  5 783 |  0 |  0 |  17 852 |  12 198 |  21 115 |  0 |  56 948 | + hu  | Hungarian |  5 783 |  0 |  0 |  17 852 |  12 198 |  21 115 |  0 |  56 948 | 
-| is | Icelandic |  0 |  0 |  0 |  0 |  0 |  1,581 |  0 |  1,581 | + is  | Icelandic |  0 |  0 |  0 |  0 |  0 |  1 581 |  0 |  1 581 | 
-| it | Italian |  11,400 |  1,141 |  2,747 |  23,771 |  15,494 |  14,700 |  684 |  69,937 + it  | Italian |  13 251 |  1 252 |  2 747 |  23 771 |  15 494 |  14 700 |  684 |  71 899 
-| ja | Japanese |  1,198 |  0 |  0 |  0 |  0 |  477 |  0 |  1,675 + ja  | Japanese |  1 747 |  0 |  0 |  0 |  0 |  477 |  0 |  2 224 
-| lt | Lithuanian |  287 |  0 |  0 |  17,316 |  11,213 |  558 |  471 |  29,844 + lt  | Lithuanian |  421 |  0 |  0 |  17 316 |  11 213 |  558 |  471 |  29 979 
-| lv | Latvian |  2,523 |  0 |  0 |  17,522 |  11,682 |  280 |  |  32,008 + lv  | Latvian |  2 646 |  0 |  0 |  17 522 |  11 682 |  280 |  135 |  32 265 
-| mk | Macedonian |  6,508 |  0 |  0 |  0 |  0 |  1,877 |  0 |  8,385 + mk  | Macedonian |  8 000 |  0 |  0 |  0 |  0 |  1 877 |  0 |  9 877 
-| ms | Malay |  0 |  0 |  0 |  0 |  0 |  3,521 |  0 |  3,521 | + ms  | Malay |  0 |  0 |  0 |  0 |  0 |  3 521 |  0 |  3 521 | 
-| mt | Maltese |  0 |  0 |  0 |  13,953 |  0 |  0 |  0 |  13,953 | + mt  | Maltese |  0 |  0 |  0 |  13 953 |  0 |  0 |  0 |  13 953 | 
-| nl | Dutch |  13,689 |  711 |  2,953 |  23,416 |  15,558 |  29,373 |  717 |  86,416 + nl  | Dutch |  15 127 |  813 |  2 953 |  23 416 |  15 558 |  29 373 |  717 |  87 956 
-| no | Norwegian |  6,675 |  0 |  0 |  0 |  0 |  0 |  721 |  7,397 + no  | Norwegian |  7 151 |  0 |  0 |  0 |  0 |  0 |  721 |  7 872 
-| pl | Polish |  24,292 |  0 |  2,378 |  19,594 |  12,811 |  26,572 |  583 |  86,230 + pl  | Polish |  25 606 |  0 |  2 380 |  19 604 |  12 817 |  26 575 |  583 |  87 567 
-| pt | Portuguese |  4,032 |  520 |  3,000 |  27,301 |  16,485 |  43,392 |  760 |  95,489 + pt  | Portuguese |  4 095 |  554 |  2 782 |  24 598 |  15 193 |  41 468 |  706 |  89 396 
-| rn | Romani |  14 |  0 |  0 |  0 |  0 |  0 |  0 |  14 | + rn  | Romani |  14 |  0 |  0 |  0 |  0 |  0 |  0 |  14 | 
-| ro | Romanian |  3,888 |  0 |  2,738 |  8,092 |  9,446 |  34,128 |  0 |  58,292 | + ro  | Romanian |  3 888 |  0 |  2 738 |  8 092 |  9 446 |  34 128 |  0 |  58 292 | 
-| ru | Russian |  7,062 |  3,768 |  0 |  0 |  0 |  6,887 |  565 |  18,282 + ru  | Russian |  8 123 |  3 984 |  0 |  0 |  0 |  6 887 |  565 |  19 560 
-| sk | Slovak |  8,545 |  0 |  0 |  18,401 |  12,734 |  5,134 |  561 |  45,376 + sk  | Slovak |  8 543 |  0 |  0 |  18 399 |  12 726 |  5 133 |  561 |  45 363 
-| sl | Slovenian |  3,534 |  0 |  0 |  18,485 |  12,241 |  17,023 |  0 |  51,282 + sl  Slovene |  3 740 |  0 |  0 |  18 528 |  12 251 |  17 061 |  0 |  51 580 
-| sq | Albanian |  0 |  0 |  0 |  0 |  0 |  2,003 |  0 |  2,003 | + sq  | Albanian |  0 |  0 |  0 |  0 |  0 |  2 003 |  0 |  2 003 | 
-| sr | Serbian |  10,661 |  0 |  0 |  0 |  0 |  20,727 |  0 |  31,388 + sr  | Serbian |  10 961 |  0 |  0 |  0 |  0 |  20 727 |  0 |  31 688 
-| sv | Swedish |  12,396 |  0 |  0 |  19,609 |  13,840 |  14,694 |  638 |  61,178 + sv  | Swedish |  15 320 |  0 |  0 |  19 542 |  13 784 |  14 666 |  638 |  63 950 
-| tr | Turkish |  0 |  0 |  0 |  0 |  0 |  21,190 |  0 |  21,190 | + tr  | Turkish |  0 |  0 |  0 |  0 |  0 |  21 190 |  0 |  21 190 | 
-| uk | Ukrainian |  9,571 |  0 |  0 |  0 |  0 |  245 |  596 |  10,411 + uk  | Ukrainian |  10 817 |  0 |  0 |  0 |  0 |  244 |  596 |  11 657 
-| vi | Vietnamese |  0 |  0 |  0 |  0 |  0 |  1,474 |  0 |  1,474 | + vi  | Vietnamese |  0 |  0 |  0 |  0 |  0 |  1 474 |  0 |  1 474 
-| **Subtotal** |   |  283,075 |  30,044 |  27,189 |  428,621 |  278,178 |  539,250 |  11,593 |  1,676 293 +|  zh  | Chinese |  0 |  240 |  0 |  0 |  0 |  2 246 688 |  0 |  2 487 
-| cs  Czech |  106,899 |  4,124 |  2,310 |  19,085 |  12,188 |  50,604 |  562 |  195,771 +| **Total** |  |  303 772 |  27 616 |  24 658 |  406 459 |  263 864 |  489 170 |  11 102 |  1 526 633 
-| **TOTAL** |   |  389,974 |  30,073 |  27,184 |  428,482 |  277,458 |  539,489 |  11,585 |  1,704,208 |+ cs  Czech |  110 573 |  4 351 |  2 310 |  19 085 |  12 908 |  50 604 |  562 |  200 393 
 +| **TOTAL** |  |  414 345 |  31 967 |  26 968 |  425 543 |  276 772 |  539 774 |  11 664 |  1 727 026 |
  
 N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart. N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart.
Line 113: Line 114:
 ^  Language  ^  Tags  ^  Lemmas  ^  Brief description  ^  Detailed description  ^  Tool  ^ ^  Language  ^  Tags  ^  Lemmas  ^  Brief description  ^  Detailed description  ^  Tool  ^
 ^ Belarusian |  ✔  |   ✔        [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]]  | ^ Belarusian |  ✔  |   ✔        [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]]  |
-^ Bulgarian |  ✔  |   ✔    |     |  [[http://bultreebank.org/en/resources/short-description-dependency-part-bultreebank-bultreebank-dp/btb-tr03-2/|in English]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |+^ Bulgarian |  ✔  |   ✔    |  [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]]  |  [[http://bultreebank.org/en/resources/short-description-dependency-part-bultreebank-bultreebank-dp/btb-tr03-2/|in English]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 ^ Catalan |  ✔  |  ✔  |  [[http://clic.ub.edu/corpus/webfm_send/18|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ Catalan |  ✔  |  ✔  |  [[http://clic.ub.edu/corpus/webfm_send/18|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 +^ Chinese |  ✔  |    |  [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]]  |  [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]]  |  [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]]  |
 ^ Croatian |  ✔  |  ✔  |   [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]]  |      [[https://github.com/uzh/reldi|ReLDI Tagger]]   | ^ Croatian |  ✔  |  ✔  |   [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]]  |      [[https://github.com/uzh/reldi|ReLDI Tagger]]   |
 ^ Czech |  ✔  |  ✔  |  [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]]  |  [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]]  |  [[http://ufal.mff.cuni.cz/morce/index.php|Morče]]  | ^ Czech |  ✔  |  ✔  |  [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]]  |  [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]]  |  [[http://ufal.mff.cuni.cz/morce/index.php|Morče]]  |
-^ Dutch |  ✔  |   ✔    |   [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]]  [[http://www.inl.nl/tst-centrale/images/stories/producten/documentatie/ehc_handleiding_nl.pdf|in Dutch]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |+^ Dutch |  ✔  |   ✔    |   [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]]   |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 ^ English |  ✔    ✔  |  [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]]  | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]]  |  [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  | ^ English |  ✔    ✔  |  [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]]  | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]]  |  [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  |
 ^ Estonian |  ✔  |  ✔  |  [[http://www.cl.ut.ee/korpused/morfliides/seletus| in Estonian and English]]  |      [[http://http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ Estonian |  ✔  |  ✔  |  [[http://www.cl.ut.ee/korpused/morfliides/seletus| in Estonian and English]]  |      [[http://http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
-^ Finnish |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/finntreebank/|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]]  |+^ Finnish |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/finntreebank/|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] + [[https://code.google.com/archive/p/hunpos/|HunPOS]]  |
 ^ French |  ✔  |  ✔  |  [[http://www.ims.uni-stuttgart.de/%7Eschmid/french-tagset.html|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ French |  ✔  |  ✔  |  [[http://www.ims.uni-stuttgart.de/%7Eschmid/french-tagset.html|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 ^ German |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]]%%**%%  |  [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]]  | ^ German |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]]%%**%%  |  [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]]  |
-^ Hungarian |  ✔  |         [[http://nl.ijs.si/ME/Vault/V3/msd/html/msd.html#SECTION05400000000000000000|in English]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]]  |+^ Hungarian |  ✔  |    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v12_hu&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=|List]]  |  [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]]   [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]]  |
 ^ Icelandic |  ✔  |  ✔  |  [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]]        [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]]  | ^ Icelandic |  ✔  |  ✔  |  [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]]        [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]]  |
 ^ Italian |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ Italian |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
-^ Japanese |  ✔  |  ✔  |  [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]]        [[https://taku910.github.io/mecab/|MeCab]]  |+^ Japanese |  ✔  |  ✔  |  [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]]        [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]]  |
 ^ Latvian |  ✔  |  ✔  |   [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]]  |      [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]]  | ^ Latvian |  ✔  |  ✔  |   [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]]  |      [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]]  |
 ^ Norwegian |  ✔  |  ✔  | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]]  |      [[https://visl.sdu.dk/remoting.html|VISL]]  | ^ Norwegian |  ✔  |  ✔  | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]]  |      [[https://visl.sdu.dk/remoting.html|VISL]]  |
-^ Polish |  ✔  |  ✔  |  [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]]  |  [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]]  |  [[http://sgjp.pl/morfeusz/|Morfeusz]][[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]]  |+^ Polish |  ✔  |  ✔  |  [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]]  |  [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]]  |  [[http://sgjp.pl/morfeusz/|Morfeusz]] [[https://github.com/kwrobel-nlp/krnnt|KRNNT]]   |
 ^ Portuguese |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ Portuguese |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 ^ Russian |  ✔  |  ✔  |  [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]]%%***%%  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ Russian |  ✔  |  ✔  |  [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]]%%***%%  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
-^ Slovak |  ✔  |  ✔  |  [[http://korpus.sk/morpho.html/|in Slovak]]  |  [[http://korpus.sk/attachments/publications/2004-garabik-gianitsova-horak-simkova-tokenizacia.pdf|in Slovak]]  |  [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] +^ Slovak |  ✔  |  ✔  |  [[https://korpus.sk/morpho_en.html/|in English]]  |  [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]]  |  [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] 
-^ Slovene |  ✔  |  ✔  |    |  [[http://nl.ijs.si/ME/V4/msd/html/msd-sl.html|in English]]  |  [[http://nl2.ijs.si/analyze/|ToTaLe]] +^ Slovene |  ✔  |  ✔  |  [[https://www.sketchengine.eu/slovene-tagset-multext-east-v3/|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-sl.html|in English]]  |  [[http://nl2.ijs.si/analyze/|ToTaLe]] 
-^ Serbian |  ✔  |  ✔  |     |   [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]]  |  [[https://github.com/uzh/reldi|ReLDI Tagger]]   |+^ Serbian |  ✔  |  ✔  |  [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]]  |   [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]]  |  [[https://github.com/uzh/reldi|ReLDI Tagger]]   |
 ^ Spanish |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ Spanish |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 ^ Swedish |  ✔  |  ✔  |  [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]]        [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]]  | ^ Swedish |  ✔  |  ✔  |  [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]]        [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]]  |
-^ Ukrainian |  ✔  |  ✔  |  [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)        |  [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]]  |+^ Ukrainian |  ✔  |  ✔    [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)   |  [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]]  |
  
  
Line 218: Line 220:
   * [[http://ufal.mff.cuni.cz/morfflex|MorfFlex]], [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] and [[https://is.cuni.cz/webapps/zzp/download/140018093/?back_id=10|LanGr]] for Czech   * [[http://ufal.mff.cuni.cz/morfflex|MorfFlex]], [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] and [[https://is.cuni.cz/webapps/zzp/download/140018093/?back_id=10|LanGr]] for Czech
   * [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] for Bulgarian, Dutch, English, Estonian (thanks to Helmut Schmid), French, Italian, Portuguese (thanks to Pablo Gamallo), Russian and Spanish    * [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] for Bulgarian, Dutch, English, Estonian (thanks to Helmut Schmid), French, Italian, Portuguese (thanks to Pablo Gamallo), Russian and Spanish 
-  * [[http://sgjp.pl/morfeusz/|Morfeusz]] and [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]] for Polish+  * [[http://sgjp.pl/morfeusz/|Morfeusz]] and [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] for Polish
   * [[http://code.google.com/p/hunpos/|HunPOS]] for Hungarian and other languages   * [[http://code.google.com/p/hunpos/|HunPOS]] for Hungarian and other languages
   * [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík)   * [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík)
Line 229: Line 231:
   * [[https://peteris.rocks/blog/latvian-part-of-speech-tagging/|LVTagger]] for Latvian (thanks to Pēteris Paikens and Michal Škrabal)   * [[https://peteris.rocks/blog/latvian-part-of-speech-tagging/|LVTagger]] for Latvian (thanks to Pēteris Paikens and Michal Škrabal)
   * [[http://ufal.mff.cuni.cz/udpipe|UD Pipe]] for Belarusian and Ukrainian (thanks to Bohdan Moskalevskyi)   * [[http://ufal.mff.cuni.cz/udpipe|UD Pipe]] for Belarusian and Ukrainian (thanks to Bohdan Moskalevskyi)
 +  * [[https://taku910.github.io/mecab/|MeCab]] and [[https://osdn.net/projects/unidic/|Unidic]] for Japanese (thanks to Adam Nohejl)
 +  * [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar]] for Chinese (thanks to Vlastimil Dobečka)
  
  
Line 235: Line 239:
  
 <WRAP round box 51%> <WRAP round box 51%>
-[[en:cnk:intercorp|InterCorp]] • [[en:cnk:intercorp:verze10|Version 10]] • [[en:cnk:intercorp:verze9|Version 9]] • [[en:cnk:intercorp:verze8|Version 8]] • [[en:cnk:intercorp:verze7|Version 7]] • [[en:cnk:intercorp:verze6|Version 6]] • [[en:cnk:intercorp:verze5|Version 5]] • [[en:cnk:intercorp:verze4|Verze 4]] • [[en:cnk:intercorp:verze3|Version 3]] • [[en:cnk:intercorp:historie|Version history]]+[[en:cnk:intercorp|InterCorp]] • [[en:cnk:intercorp:verze11|Version 11]] • [[en:cnk:intercorp:verze10|Version 10]] • [[en:cnk:intercorp:verze9|Version 9]] • [[en:cnk:intercorp:verze8|Version 8]] • [[en:cnk:intercorp:verze7|Version 7]] • [[en:cnk:intercorp:verze6|Version 6]] • [[en:cnk:intercorp:verze5|Version 5]] • [[en:cnk:intercorp:verze4|Verze 4]] • [[en:cnk:intercorp:verze3|Version 3]] • [[en:cnk:intercorp:historie|Version history]]
  
-See [[http://ucnk.ff.cuni.cz/intercorp/?lang=en|the original InterCorp site in English]].+See [[https://intercorp.korpus.cz/?lang=en|the original InterCorp site in English]].
 </WRAP> </WRAP>