AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
en:cnk:intercorp:verze11 [2019/12/20 00:22] – [InterCorp Release 12] alexandrrosenen:cnk:intercorp:verze11 [2019/12/20 11:11] (current) – old revision restored (2019/11/07 23:10) michalkren
Line 1: Line 1:
 ~~NOTOC~~ ~~NOTOC~~
-====== InterCorp Release 12 ======+====== InterCorp Release 11 ======
  
 ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^
-^ Positions ^ Number of tokens |  137 059 021 |  116 673 027 |  373 873 819 |  1 549 570 665 +^ Positions ^ Number of tokens |   132,508,429 |  115,574,528 |  340,554,768 |  1,550,923,096 
-^ ::: ^ Number of word forms |   110 588 784 |  89 819 765 |  310 914 295 |  1 222 868 666 +^ ::: ^ Number of word forms |   106,898,538 |  88,872,779 |  283,075,338 |  1,225,361,750 
-^ Structural attributes ^ Number of documents |  1 619 |  30 |  3 806 |   281 +^ Structural attributes ^ Number of documents |  1,564 |  28 |  3,494 |   261 
-^ ::: ^ Number of texts |  619 |  111 951 |  3 806 |  1 843 489 +^ ::: ^ Number of texts |   1,507 |  111,672 |  3,232 |  1,841,341 
-^ ::: ^ Number of sentences |  9 518 229 |  13 606 183 |  23 076 128 |  143 165 959 |+^ ::: ^ Number of sentences |  9,193,433 |  13,556,382 |  21,000,997 |  142,734,659 |
 ^ Further information ^ reference |  YES   ^^^^ ^ Further information ^ reference |  YES   ^^^^
 ^ ::: ^ representative |  NO  ^^^^ ^ ::: ^ representative |  NO  ^^^^
-^ ::: ^ publication date |  2019  ^^^^ +^ ::: ^ publication date |  2018  ^^^^ 
-^ ::: ^ foreign languages |  40  ^^^^+^ ::: ^ foreign languages |  39  ^^^^
 ^ ::: ^ tagged languages |  26  ^^^^ ^ ::: ^ tagged languages |  26  ^^^^
 ^ ::: ^ lemmatized languages |  25  ^^^^ ^ ::: ^ lemmatized languages |  25  ^^^^
Line 54: Line 54:
  
  
-[{{:cnk:intercorp:intercorp_wordcounts_v12.png|Setup of the parallel corpus – the core and collections}}]+[{{:cnk:intercorp:intercorp_wordcounts_v11.png|Setup of the parallel corpus – the core and collections}}]
  
-[{{:cnk:intercorp:intercorp_wordcounts2_v12.png|Setup of the parallel corpus – the core}}]+[{{:cnk:intercorp:intercorp_wordcounts2_v11.png|Setup of the parallel corpus – the core}}]
  
-[{{:cnk:intercorp:intercorp_wordcounts3_v12.png|Setup of the parallel corpus – collections}}]+[{{:cnk:intercorp:intercorp_wordcounts3_v11.png|Setup of the parallel corpus – collections}}]
  
 ===== Corpus size in thousands of words ===== ===== Corpus size in thousands of words =====
  
 ^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Bible ^ Total ^ ^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Bible ^ Total ^
- ar  | Arabic |  34 |  0 |  0 |  0 |  0 |  0 |  0 |  34 | +| ar | Arabic |  34 |  0 |  0 |  0 |  0 |  0 |  0 |  34 |   
- be  | Belarusian |  5 319 |  0 |  0 |  0 |  0 |  0 |  0 |  5 319 +| be | Belarusian |  4,426 |  0 |  0 |  0 |  0 |  0 |  0 |  4,426   
- bg  | Bulgarian |  7 068 |  0 |  0 |  13 577 |  9 083 |  0 |  0 |  29 728 +| bg | Bulgarian |  6,780 |  0 |  0 |  13,577 |  9,083 |  0 |  0 |  29,441 
- ca  | Catalan |  7 481 |  0 |  0 |  0 |  0 |  0 |  736 |  8 217 +| ca | Catalan |  5,596 |  0 |  0 |  0 |  0 |  0 |  736 |  6,332 
- da  | Danish |  6 654 |  0 |  0 |  20 313 |  13 916 |  14 429 |  657 |  55 968 +| da | Danish |  5,595 |  0 |  0 |  20,313 |  13,916 |  14,429 |  657 |  54,910 
- de  | German |  36 373 |  4 704 |  2 483 |  20 610 |  13 088 |  8 392 |  724 |  86 374 +| de | German |  34,915 |  4,457 |  2,483 |  20,610 |  13,088 |  8,392 |  724 |  84,669 
- el  | Greek |  0 |  0 |  0 |  23 853 |  15 404 |  23 709 |  0 |  62 966 | +| el | Greek |  0 |  0 |  0 |  23,853 |  15,404 |  23,709 |  0 |  62,966 | 
- en  | English |  32 152 |  4 856 |  2 670 |  22 902 |  15 576 |  52 105 |  730 |  130 992 +| en | English |  27,968 |  4,604 |  2,670 |  22,902 |  15,576 |  52,105 |  730 |  126,555 
- es  | Spanish |  25 595 |  5 614 |  2 859 |  26 262 |  16 249 |  36 650 |  0 |  113 228 +| es | Spanish |  23,349 |  5,322 |  2,859 |  26,262 |  16,249 |  36,650 |  0 |  110,691 
- et  | Estonian |  0 |  0 |  0 |  14 896 |  10 899 |  10 298 |  0 |  36 093 | +| et | Estonian |  0 |  0 |  0 |  14,896 |  10,899 |  10,298 |  0 |  36,093 | 
- fi  | Finnish |  5 329 |  0 |  0 |  15 269 |  10 108 |  15 047 |  543 |  46 296 +| fi | Finnish |  4,585 |  0 |  0 |  15,489 |  10,175 |  15,098 |  544 |  45,890 
- fr  | French |  18 241 |  5 600 |  3 046 |  26 200 |  17 179 |  25 986 |  764 |  97 016 +| fr | French |  17,213 |  5,391 |  3,046 |  26,200 |  17,179 |  25,986 |  764 |  95,779 
- he  | Hebrew |  0 |  0 |  0 |  0 |  0 |  16 221 |  0 |  16 221 | +| he | Hebrew |  0 |  0 |  0 |  0 |  0 |  16,221 |  0 |  16,221 | 
- hi  Hindi |  409 |  0 |  0 |  0 |  0 |  0 |  0 |  409 | +| hi | Hindu |  409 |  0 |  0 |  0 |  0 |  0 |  0 |  409 | 
- hr  | Croatian |  21 027 |  0 |  0 |  0 |  0 |  19 048 |  571 |  40 646 +| hr | Croatian |  20,147 |  0 |  0 |  0 |  0 |  19,048 |  571 |  39,767 
- hu  | Hungarian |  5 783 |  0 |  0 |  17 852 |  12 198 |  21 115 |  0 |  56 948 | +| hu | Hungarian |  5 783 |  0 |  0 |  17 852 |  12 198 |  21 115 |  0 |  56 948 | 
- is  | Icelandic |  0 |  0 |  0 |  0 |  0 |  1 581 |  0 |  1 581 | +| is | Icelandic |  0 |  0 |  0 |  0 |  0 |  1,581 |  0 |  1,581 | 
- it  | Italian |  13 251 |  1 252 |  2 747 |  23 771 |  15 494 |  14 700 |  684 |  71 899 +| it | Italian |  11,400 |  1,141 |  2,747 |  23,771 |  15,494 |  14,700 |  684 |  69,937 
- ja  | Japanese |  1 747 |  0 |  0 |  0 |  0 |  477 |  0 |  2 224 +| ja | Japanese |  1,198 |  0 |  0 |  0 |  0 |  477 |  0 |  1,675 
- lt  | Lithuanian |  421 |  0 |  0 |  17 316 |  11 213 |  558 |  471 |  29 979 +| lt | Lithuanian |  287 |  0 |  0 |  17,316 |  11,213 |  558 |  471 |  29,844 
- lv  | Latvian |  2 646 |  0 |  0 |  17 522 |  11 682 |  280 |  135 |  32 265 +| lv | Latvian |  2,523 |  0 |  0 |  17,522 |  11,682 |  280 |  |  32,008 
- mk  | Macedonian |  8 000 |  0 |  0 |  0 |  0 |  1 877 |  0 |  9 877 +| mk | Macedonian |  6,508 |  0 |  0 |  0 |  0 |  1,877 |  0 |  8,385 
- ms  | Malay |  0 |  0 |  0 |  0 |  0 |  3 521 |  0 |  3 521 | +| ms | Malay |  0 |  0 |  0 |  0 |  0 |  3,521 |  0 |  3,521 | 
- mt  | Maltese |  0 |  0 |  0 |  13 953 |  0 |  0 |  0 |  13 953 | +| mt | Maltese |  0 |  0 |  0 |  13,953 |  0 |  0 |  0 |  13,953 | 
- nl  | Dutch |  15 127 |  813 |  2 953 |  23 416 |  15 558 |  29 373 |  717 |  87 956 +| nl | Dutch |  13,689 |  711 |  2,953 |  23,416 |  15,558 |  29,373 |  717 |  86,416 
- no  | Norwegian |  7 151 |  0 |  0 |  0 |  0 |  0 |  721 |  7 872 +| no | Norwegian |  6,675 |  0 |  0 |  0 |  0 |  0 |  721 |  7,397 
- pl  | Polish |  25 606 |  0 |  2 380 |  19 604 |  12 817 |  26 575 |  583 |  87 567 +| pl | Polish |  24,292 |  0 |  2,378 |  19,594 |  12,811 |  26,572 |  583 |  86,230 
- pt  | Portuguese |  4 095 |  554 |  2 782 |  24 598 |  15 193 |  41 468 |  706 |  89 396 +| pt | Portuguese |  4,032 |  520 |  3,000 |  27,301 |  16,485 |  43,392 |  760 |  95,489 
- rn  | Romani |  14 |  0 |  0 |  0 |  0 |  0 |  0 |  14 | +| rn | Romani |  14 |  0 |  0 |  0 |  0 |  0 |  0 |  14 | 
- ro  | Romanian |  3 888 |  0 |  2 738 |  8 092 |  9 446 |  34 128 |  0 |  58 292 | +| ro | Romanian |  3,888 |  0 |  2,738 |  8,092 |  9,446 |  34,128 |  0 |  58,292 | 
- ru  | Russian |  8 123 |  3 984 |  0 |  0 |  0 |  6 887 |  565 |  19 560 +| ru | Russian |  7,062 |  3,768 |  0 |  0 |  0 |  6,887 |  565 |  18,282 
- sk  | Slovak |  8 543 |  0 |  0 |  18 399 |  12 726 |  5 133 |  561 |  45 363 +| sk | Slovak |  8,545 |  0 |  0 |  18,401 |  12,734 |  5,134 |  561 |  45,376 
- sl  Slovene |  3 740 |  0 |  0 |  18 528 |  12 251 |  17 061 |  0 |  51 580 +| sl | Slovenian |  3,534 |  0 |  0 |  18,485 |  12,241 |  17,023 |  0 |  51,282 
- sq  | Albanian |  0 |  0 |  0 |  0 |  0 |  2 003 |  0 |  2 003 | +| sq | Albanian |  0 |  0 |  0 |  0 |  0 |  2,003 |  0 |  2,003 | 
- sr  | Serbian |  10 961 |  0 |  0 |  0 |  0 |  20 727 |  0 |  31 688 +| sr | Serbian |  10,661 |  0 |  0 |  0 |  0 |  20,727 |  0 |  31,388 
- sv  | Swedish |  15 320 |  0 |  0 |  19 542 |  13 784 |  14 666 |  638 |  63 950 +| sv | Swedish |  12,396 |  0 |  0 |  19,609 |  13,840 |  14,694 |  638 |  61,178 
- tr  | Turkish |  0 |  0 |  0 |  0 |  0 |  21 190 |  0 |  21 190 | +| tr | Turkish |  0 |  0 |  0 |  0 |  0 |  21,190 |  0 |  21,190 | 
- uk  | Ukrainian |  10 817 |  0 |  0 |  0 |  0 |  244 |  596 |  11 657 +| uk | Ukrainian |  9,571 |  0 |  0 |  0 |  0 |  245 |  596 |  10,411 
- vi  | Vietnamese |  0 |  0 |  0 |  0 |  0 |  1 474 |  0 |  1 474 +| vi | Vietnamese |  0 |  0 |  0 |  0 |  0 |  1,474 |  0 |  1,474 | 
-|  zh  | Chinese |  0 |  240 |  0 |  0 |  0 |  2 246 688 |  0 |  2 487 +| **Subtotal** |   |  283,075 |  30,044 |  27,189 |  428,621 |  278,178 |  539,250 |  11,593 |  1,676 293 
-| **Total** |  |  303 772 |  27 616 |  24 658 |  406 459 |  263 864 |  489 170 |  11 102 |  1 526 633 +| cs |  Czech |  106,899 |  4,124 |  2,310 |  19,085 |  12,188 |  50,604 |  562 |  195,771 
- cs  | Czech |  110 573 |  4 351 |  2 310 |  19 085 |  12 908 |  50 604 |  562 |  200 393 +| **TOTAL** |   |  389,974 |  30,073 |  27,184 |  428,482 |  277,458 |  539,489 |  11,585 |  1,704,208 |
-| **TOTAL** |  |  414 345 |  31 967 |  26 968 |  425 543 |  276 772 |  539 774 |  11 664 |  1 727 026 |+
  
 N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart. N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart.
Line 114: Line 113:
 ^  Language  ^  Tags  ^  Lemmas  ^  Brief description  ^  Detailed description  ^  Tool  ^ ^  Language  ^  Tags  ^  Lemmas  ^  Brief description  ^  Detailed description  ^  Tool  ^
 ^ Belarusian |  ✔  |   ✔        [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]]  | ^ Belarusian |  ✔  |   ✔        [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]]  |
-^ Bulgarian |  ✔  |   ✔    |  [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]]  |  [[http://bultreebank.org/en/resources/short-description-dependency-part-bultreebank-bultreebank-dp/btb-tr03-2/|in English]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |+^ Bulgarian |  ✔  |   ✔    |     |  [[http://bultreebank.org/en/resources/short-description-dependency-part-bultreebank-bultreebank-dp/btb-tr03-2/|in English]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 ^ Catalan |  ✔  |  ✔  |  [[http://clic.ub.edu/corpus/webfm_send/18|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ Catalan |  ✔  |  ✔  |  [[http://clic.ub.edu/corpus/webfm_send/18|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
-^ Chinese |  ✔  |    |  [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]]  |  [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]]  |  [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]]  | 
 ^ Croatian |  ✔  |  ✔  |   [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]]  |      [[https://github.com/uzh/reldi|ReLDI Tagger]]   | ^ Croatian |  ✔  |  ✔  |   [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]]  |      [[https://github.com/uzh/reldi|ReLDI Tagger]]   |
 ^ Czech |  ✔  |  ✔  |  [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]]  |  [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]]  |  [[http://ufal.mff.cuni.cz/morce/index.php|Morče]]  | ^ Czech |  ✔  |  ✔  |  [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]]  |  [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]]  |  [[http://ufal.mff.cuni.cz/morce/index.php|Morče]]  |
-^ Dutch |  ✔  |   ✔    |   [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]]   |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |+^ Dutch |  ✔  |   ✔    |   [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]]  [[http://www.inl.nl/tst-centrale/images/stories/producten/documentatie/ehc_handleiding_nl.pdf|in Dutch]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 ^ English |  ✔    ✔  |  [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]]  | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]]  |  [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  | ^ English |  ✔    ✔  |  [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]]  | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]]  |  [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  |
 ^ Estonian |  ✔  |  ✔  |  [[http://www.cl.ut.ee/korpused/morfliides/seletus| in Estonian and English]]  |      [[http://http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ Estonian |  ✔  |  ✔  |  [[http://www.cl.ut.ee/korpused/morfliides/seletus| in Estonian and English]]  |      [[http://http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
-^ Finnish |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/finntreebank/|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] + [[https://code.google.com/archive/p/hunpos/|HunPOS]]  |+^ Finnish |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/finntreebank/|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]]  |
 ^ French |  ✔  |  ✔  |  [[http://www.ims.uni-stuttgart.de/%7Eschmid/french-tagset.html|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ French |  ✔  |  ✔  |  [[http://www.ims.uni-stuttgart.de/%7Eschmid/french-tagset.html|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 ^ German |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]]%%**%%  |  [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]]  | ^ German |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]]%%**%%  |  [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]]  |
-^ Hungarian |  ✔  |    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v12_hu&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=|List]]  |  [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]]   [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]]  |+^ Hungarian |  ✔  |         [[http://nl.ijs.si/ME/Vault/V3/msd/html/msd.html#SECTION05400000000000000000|in English]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]]  |
 ^ Icelandic |  ✔  |  ✔  |  [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]]        [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]]  | ^ Icelandic |  ✔  |  ✔  |  [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]]        [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]]  |
 ^ Italian |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ Italian |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
-^ Japanese |  ✔  |  ✔  |  [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]]        [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]]  |+^ Japanese |  ✔  |  ✔  |  [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]]        [[https://taku910.github.io/mecab/|MeCab]]  |
 ^ Latvian |  ✔  |  ✔  |   [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]]  |      [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]]  | ^ Latvian |  ✔  |  ✔  |   [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]]  |      [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]]  |
 ^ Norwegian |  ✔  |  ✔  | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]]  |      [[https://visl.sdu.dk/remoting.html|VISL]]  | ^ Norwegian |  ✔  |  ✔  | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]]  |      [[https://visl.sdu.dk/remoting.html|VISL]]  |
-^ Polish |  ✔  |  ✔  |  [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]]  |  [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]]  |  [[http://sgjp.pl/morfeusz/|Morfeusz]] [[https://github.com/kwrobel-nlp/krnnt|KRNNT]]   |+^ Polish |  ✔  |  ✔  |  [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]]  |  [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]]  |  [[http://sgjp.pl/morfeusz/|Morfeusz]][[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]]  |
 ^ Portuguese |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ Portuguese |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 ^ Russian |  ✔  |  ✔  |  [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]]%%***%%  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ Russian |  ✔  |  ✔  |  [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]]%%***%%  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
-^ Slovak |  ✔  |  ✔  |  [[https://korpus.sk/morpho_en.html/|in English]]  |  [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]]  |  [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] +^ Slovak |  ✔  |  ✔  |  [[http://korpus.sk/morpho.html/|in Slovak]]  |  [[http://korpus.sk/attachments/publications/2004-garabik-gianitsova-horak-simkova-tokenizacia.pdf|in Slovak]]  |  [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] 
-^ Slovene |  ✔  |  ✔  |  [[https://www.sketchengine.eu/slovene-tagset-multext-east-v3/|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-sl.html|in English]]  |  [[http://nl2.ijs.si/analyze/|ToTaLe]] +^ Slovene |  ✔  |  ✔  |    |  [[http://nl.ijs.si/ME/V4/msd/html/msd-sl.html|in English]]  |  [[http://nl2.ijs.si/analyze/|ToTaLe]] 
-^ Serbian |  ✔  |  ✔  |  [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]]  |   [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]]  |  [[https://github.com/uzh/reldi|ReLDI Tagger]]   |+^ Serbian |  ✔  |  ✔  |     |   [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]]  |  [[https://github.com/uzh/reldi|ReLDI Tagger]]   |
 ^ Spanish |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  | ^ Spanish |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 ^ Swedish |  ✔  |  ✔  |  [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]]        [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]]  | ^ Swedish |  ✔  |  ✔  |  [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]]        [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]]  |
-^ Ukrainian |  ✔  |  ✔    [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)   |  [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]]  |+^ Ukrainian |  ✔  |  ✔  |  [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)        |  [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]]  |
  
  
Line 220: Line 218:
   * [[http://ufal.mff.cuni.cz/morfflex|MorfFlex]], [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] and [[https://is.cuni.cz/webapps/zzp/download/140018093/?back_id=10|LanGr]] for Czech   * [[http://ufal.mff.cuni.cz/morfflex|MorfFlex]], [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] and [[https://is.cuni.cz/webapps/zzp/download/140018093/?back_id=10|LanGr]] for Czech
   * [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] for Bulgarian, Dutch, English, Estonian (thanks to Helmut Schmid), French, Italian, Portuguese (thanks to Pablo Gamallo), Russian and Spanish    * [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] for Bulgarian, Dutch, English, Estonian (thanks to Helmut Schmid), French, Italian, Portuguese (thanks to Pablo Gamallo), Russian and Spanish 
-  * [[http://sgjp.pl/morfeusz/|Morfeusz]] and [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] for Polish+  * [[http://sgjp.pl/morfeusz/|Morfeusz]] and [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]] for Polish
   * [[http://code.google.com/p/hunpos/|HunPOS]] for Hungarian and other languages   * [[http://code.google.com/p/hunpos/|HunPOS]] for Hungarian and other languages
   * [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík)   * [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík)
Line 231: Line 229:
   * [[https://peteris.rocks/blog/latvian-part-of-speech-tagging/|LVTagger]] for Latvian (thanks to Pēteris Paikens and Michal Škrabal)   * [[https://peteris.rocks/blog/latvian-part-of-speech-tagging/|LVTagger]] for Latvian (thanks to Pēteris Paikens and Michal Škrabal)
   * [[http://ufal.mff.cuni.cz/udpipe|UD Pipe]] for Belarusian and Ukrainian (thanks to Bohdan Moskalevskyi)   * [[http://ufal.mff.cuni.cz/udpipe|UD Pipe]] for Belarusian and Ukrainian (thanks to Bohdan Moskalevskyi)
-  * [[https://taku910.github.io/mecab/|MeCab]] and [[https://osdn.net/projects/unidic/|Unidic]] for Japanese (thanks to Adam Nohejl) +  * [[https://taku910.github.io/mecab/|MeCab]] and [[https://osdn.net/projects/unidic/|Unidic]] for Japanese
-  * [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar]] for Chinese (thanks to Vlastimil Dobečka)+
  
  
Line 239: Line 236:
  
 <WRAP round box 51%> <WRAP round box 51%>
-[[en:cnk:intercorp|InterCorp]] • [[en:cnk:intercorp:verze11|Version 11]] • [[en:cnk:intercorp:verze10|Version 10]] • [[en:cnk:intercorp:verze9|Version 9]] • [[en:cnk:intercorp:verze8|Version 8]] • [[en:cnk:intercorp:verze7|Version 7]] • [[en:cnk:intercorp:verze6|Version 6]] • [[en:cnk:intercorp:verze5|Version 5]] • [[en:cnk:intercorp:verze4|Verze 4]] • [[en:cnk:intercorp:verze3|Version 3]] • [[en:cnk:intercorp:historie|Version history]]+[[en:cnk:intercorp|InterCorp]] • [[en:cnk:intercorp:verze10|Version 10]] • [[en:cnk:intercorp:verze9|Version 9]] • [[en:cnk:intercorp:verze8|Version 8]] • [[en:cnk:intercorp:verze7|Version 7]] • [[en:cnk:intercorp:verze6|Version 6]] • [[en:cnk:intercorp:verze5|Version 5]] • [[en:cnk:intercorp:verze4|Verze 4]] • [[en:cnk:intercorp:verze3|Version 3]] • [[en:cnk:intercorp:historie|Version history]]
  
-See [[https://intercorp.korpus.cz/?lang=en|the original InterCorp site in English]].+See [[http://ucnk.ff.cuni.cz/intercorp/?lang=en|the original InterCorp site in English]].
 </WRAP> </WRAP>