Skrýt
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
en:cnk:intercorp:verze11 [2019/12/20 00:22]
Alexandr Rosen [InterCorp Release 12]
en:cnk:intercorp:verze11 [2019/12/20 11:11] (current)
Michal Křen old revision restored (2019/11/07 23:10)
Line 1: Line 1:
 ~~NOTOC~~ ~~NOTOC~~
-====== InterCorp Release ​12 ======+====== InterCorp Release ​11 ======
  
 ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^
-^ Positions ^ Number of tokens |  137 059 021 |  ​116 673 027 |  ​373 873 819 |  1 549 570 665 +^ Positions ^ Number of tokens |   132,​508,​429 ​|  ​115,​574,​528 ​|  ​340,​554,​768 ​|  1,​550,​923,​096 ​
-^ ::: ^ Number of word forms |   110 588 784 |  ​89 819 765 |  ​310 914 295 |  1 222 868 666 +^ ::: ^ Number of word forms |   106,​898,​538 ​|  ​88,​872,​779 ​|  ​283,​075,​338 ​|  1,​225,​361,​750 ​
-^ Structural attributes ^ Number of documents |  1 619 |  ​30 |  3 806 |   281 +^ Structural attributes ^ Number of documents |  1,564 |  ​28 |  3,494 |   261 
-^ ::: ^ Number of texts |  619 |  111 951 |  3 806 |  1 843 489 +^ ::: ^ Number of texts |   ​1,507 |  111,672 |  3,232 |  1,​841,​341 ​
-^ ::: ^ Number of sentences |  9 518 229 |  13 606 183 |  ​23 076 128 |  ​143 165 959 |+^ ::: ^ Number of sentences |  9,​193,​433 ​|  13,​556,​382 ​|  ​21,​000,​997 ​|  ​142,​734,​659 ​|
 ^ Further information ^ reference |  YES   ^^^^ ^ Further information ^ reference |  YES   ^^^^
 ^ ::: ^ representative |  NO  ^^^^ ^ ::: ^ representative |  NO  ^^^^
-^ ::: ^ publication date |  ​2019  ​^^^^ +^ ::: ^ publication date |  ​2018  ​^^^^ 
-^ ::: ^ foreign languages |  ​40  ^^^^+^ ::: ^ foreign languages |  ​39  ^^^^
 ^ ::: ^ tagged languages |  26  ^^^^ ^ ::: ^ tagged languages |  26  ^^^^
 ^ ::: ^ lemmatized languages |  25  ^^^^ ^ ::: ^ lemmatized languages |  25  ^^^^
Line 54: Line 54:
  
  
-[{{:​cnk:​intercorp:​intercorp_wordcounts_v12.png|Setup of the parallel corpus – the core and collections}}]+[{{:​cnk:​intercorp:​intercorp_wordcounts_v11.png|Setup of the parallel corpus – the core and collections}}]
  
-[{{:​cnk:​intercorp:​intercorp_wordcounts2_v12.png|Setup of the parallel corpus – the core}}]+[{{:​cnk:​intercorp:​intercorp_wordcounts2_v11.png|Setup of the parallel corpus – the core}}]
  
-[{{:​cnk:​intercorp:​intercorp_wordcounts3_v12.png|Setup of the parallel corpus – collections}}]+[{{:​cnk:​intercorp:​intercorp_wordcounts3_v11.png|Setup of the parallel corpus – collections}}]
  
 ===== Corpus size in thousands of words ===== ===== Corpus size in thousands of words =====
  
 ^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Bible ^ Total ^ ^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Bible ^ Total ^
- ar  | Arabic |  34 |  0 |  0 |  0 |  0 |  0 |  0 |  34 | +| ar | Arabic |  34 |  0 |  0 |  0 |  0 |  0 |  0 |  34 |   
- be  | Belarusian |  ​5 319 |  0 |  0 |  0 |  0 |  0 |  0 |  ​5 319 +| be | Belarusian |  ​4,426 |  0 |  0 |  0 |  0 |  0 |  0 |  ​4,426   
- bg  | Bulgarian |  ​7 068 |  0 |  0 |  13 577 |  9 083 |  0 |  0 |  29 728 +| bg | Bulgarian |  ​6,780 |  0 |  0 |  13,577 |  9,083 |  0 |  0 |  29,441 
- ca  | Catalan |  ​7 481 |  0 |  0 |  0 |  0 |  0 |  736 |  ​8 217 +| ca | Catalan |  ​5,596 |  0 |  0 |  0 |  0 |  0 |  736 |  ​6,332 
- da  | Danish |  ​6 654 |  0 |  0 |  20 313 |  13 916 |  14 429 |  657 |  ​55 968 +| da | Danish |  ​5,595 |  0 |  0 |  20,313 |  13,916 |  14,429 |  657 |  ​54,​910 ​
- de  | German |  ​36 373 |  4 704 |  2 483 |  20 610 |  13 088 |  8 392 |  724 |  ​86 374 +| de | German |  ​34,​915 ​|  4,457 |  2,483 |  20,610 |  13,088 |  8,392 |  724 |  ​84,​669 ​
- el  | Greek |  0 |  0 |  0 |  23 853 |  15 404 |  23 709 |  0 |  62 966 | +| el | Greek |  0 |  0 |  0 |  23,853 |  15,404 |  23,709 |  0 |  62,966 | 
- en  | English |  ​32 152 |  4 856 |  2 670 |  22 902 |  15 576 |  52 105 |  730 |  ​130 992 +| en | English |  ​27,​968 ​|  4,604 |  2,670 |  22,902 |  15,576 |  52,105 |  730 |  ​126,​555 ​
- es  | Spanish |  ​25 595 |  5 614 |  2 859 |  26 262 |  16 249 |  36 650 |  0 |  ​113 228 +| es | Spanish |  ​23,​349 ​|  5,322 |  2,859 |  26,262 |  16,249 |  36,650 |  0 |  ​110,​691 ​
- et  | Estonian |  0 |  0 |  0 |  14 896 |  10 899 |  10 298 |  0 |  36 093 | +| et | Estonian |  0 |  0 |  0 |  14,896 |  10,899 |  10,298 |  0 |  36,093 | 
- fi  | Finnish |  ​5 329 |  0 |  0 |  15 269 |  10 108 |  15 047 |  ​543 |  ​46 296 +| fi | Finnish |  ​4,585 |  0 |  0 |  15,489 |  10,175 |  15,098 |  ​544 |  ​45,​890 ​
- fr  | French |  ​18 241 |  5 600 |  3 046 |  26 200 |  17 179 |  25 986 |  764 |  ​97 016 +| fr | French |  ​17,​213 ​|  5,391 |  3,046 |  26,200 |  17,179 |  25,986 |  764 |  ​95,​779 ​
- he  | Hebrew |  0 |  0 |  0 |  0 |  0 |  16 221 |  0 |  16 221 | +| he | Hebrew |  0 |  0 |  0 |  0 |  0 |  16,221 |  0 |  16,221 | 
- hi  Hindi |  409 |  0 |  0 |  0 |  0 |  0 |  0 |  409 | +| hi | Hindu |  409 |  0 |  0 |  0 |  0 |  0 |  0 |  409 | 
- hr  | Croatian |  ​21 027 |  0 |  0 |  0 |  0 |  19 048 |  571 |  ​40 646 +| hr | Croatian |  ​20,​147 ​|  0 |  0 |  0 |  0 |  19,048 |  571 |  ​39,​767 ​
- hu  | Hungarian |  5 783 |  0 |  0 |  17 852 |  12 198 |  21 115 |  0 |  56 948 | +| hu | Hungarian |  5 783 |  0 |  0 |  17 852 |  12 198 |  21 115 |  0 |  56 948 | 
- is  | Icelandic |  0 |  0 |  0 |  0 |  0 |  1 581 |  0 |  1 581 | +| is | Icelandic |  0 |  0 |  0 |  0 |  0 |  1,581 |  0 |  1,581 | 
- it  | Italian |  ​13 251 |  1 252 |  2 747 |  23 771 |  15 494 |  14 700 |  684 |  ​71 899 +| it | Italian |  ​11,​400 ​|  1,141 |  2,747 |  23,771 |  15,494 |  14,700 |  684 |  ​69,​937 ​
- ja  | Japanese |  1 747 |  0 |  0 |  0 |  0 |  477 |  0 |  ​2 224 +| ja | Japanese |  1,198 |  0 |  0 |  0 |  0 |  477 |  0 |  ​1,675 
- lt  | Lithuanian |  ​421 |  0 |  0 |  17 316 |  11 213 |  558 |  471 |  29 979 +| lt | Lithuanian |  ​287 |  0 |  0 |  17,316 |  11,213 |  558 |  471 |  29,844 
- lv  | Latvian |  2 646 |  0 |  0 |  17 522 |  11 682 |  280 |  ​135 |  32 265 +| lv | Latvian |  2,523 |  0 |  0 |  17,522 |  11,682 |  280 |  ​|  32,008 
- mk  | Macedonian |  ​8 000 |  0 |  0 |  0 |  0 |  1 877 |  0 |  ​9 877 +| mk | Macedonian |  ​6,508 |  0 |  0 |  0 |  0 |  1,877 |  0 |  ​8,385 
- ms  | Malay |  0 |  0 |  0 |  0 |  0 |  3 521 |  0 |  3 521 | +| ms | Malay |  0 |  0 |  0 |  0 |  0 |  3,521 |  0 |  3,521 | 
- mt  | Maltese |  0 |  0 |  0 |  13 953 |  0 |  0 |  0 |  13 953 | +| mt | Maltese |  0 |  0 |  0 |  13,953 |  0 |  0 |  0 |  13,953 | 
- nl  | Dutch |  ​15 127 |  ​813 |  2 953 |  23 416 |  15 558 |  29 373 |  717 |  ​87 956 +| nl | Dutch |  ​13,​689 ​|  ​711 |  2,953 |  23,416 |  15,558 |  29,373 |  717 |  ​86,​416 ​
- no  | Norwegian |  ​7 151 |  0 |  0 |  0 |  0 |  0 |  721 |  7 872 +| no | Norwegian |  ​6,675 |  0 |  0 |  0 |  0 |  0 |  721 |  7,397 
- pl  | Polish |  ​25 606 |  0 |  2 380 |  19 604 |  12 817 |  26 575 |  583 |  ​87 567 +| pl | Polish |  ​24,​292 ​|  0 |  2,378 |  19,594 |  12,811 |  26,572 |  583 |  ​86,​230 ​
- pt  | Portuguese |  4 095 |  ​554 |  ​2 782 |  ​24 598 |  ​15 193 |  ​41 468 |  ​706 |  ​89 396 +| pt | Portuguese |  4,032 |  ​520 |  ​3,000 |  ​27,​301 ​|  ​16,​485 ​|  ​43,​392 ​|  ​760 |  ​95,​489 ​
- rn  | Romani |  14 |  0 |  0 |  0 |  0 |  0 |  0 |  14 | +| rn | Romani |  14 |  0 |  0 |  0 |  0 |  0 |  0 |  14 | 
- ro  | Romanian |  3 888 |  0 |  2 738 |  8 092 |  9 446 |  34 128 |  0 |  58 292 | +| ro | Romanian |  3,888 |  0 |  2,738 |  8,092 |  9,446 |  34,128 |  0 |  58,292 | 
- ru  | Russian |  ​8 123 |  3 984 |  0 |  0 |  0 |  6 887 |  565 |  ​19 560 +| ru | Russian |  ​7,062 |  3,768 |  0 |  0 |  0 |  6,887 |  565 |  ​18,​282 ​
- sk  | Slovak |  8 543 |  0 |  0 |  18 399 |  12 726 |  5 133 |  561 |  45 363 +| sk | Slovak |  8,545 |  0 |  0 |  18,401 |  12,734 |  5,134 |  561 |  45,376 
- sl  Slovene ​|  3 740 |  0 |  0 |  18 528 |  12 251 |  17 061 |  0 |  51 580 +| sl | Slovenian ​|  3,534 |  0 |  0 |  18,485 |  12,241 |  17,023 |  0 |  51,282 
- sq  | Albanian |  0 |  0 |  0 |  0 |  0 |  2 003 |  0 |  2 003 | +| sq | Albanian |  0 |  0 |  0 |  0 |  0 |  2,003 |  0 |  2,003 | 
- sr  | Serbian |  10 961 |  0 |  0 |  0 |  0 |  20 727 |  0 |  31 688 +| sr | Serbian |  10,661 |  0 |  0 |  0 |  0 |  20,727 |  0 |  31,388 
- sv  | Swedish |  ​15 320 |  0 |  0 |  19 542 |  13 784 |  14 666 |  638 |  ​63 950 +| sv | Swedish |  ​12,​396 ​|  0 |  0 |  19,609 |  13,840 |  14,694 |  638 |  ​61,​178 ​
- tr  | Turkish |  0 |  0 |  0 |  0 |  0 |  21 190 |  0 |  21 190 | +| tr | Turkish |  0 |  0 |  0 |  0 |  0 |  21,190 |  0 |  21,190 | 
- uk  | Ukrainian |  ​10 817 |  0 |  0 |  0 |  0 |  ​244 |  596 |  ​11 657 +| uk | Ukrainian |  ​9,571 |  0 |  0 |  0 |  0 |  ​245 |  596 |  ​10,​411 ​
- vi  | Vietnamese |  0 |  0 |  0 |  0 |  0 |  1 474 |  0 |  1 474 +| vi | Vietnamese |  0 |  0 |  0 |  0 |  0 |  1,474 |  0 |  1,474 | 
-|  zh  | Chinese |  0 |  240 |  0 |  0 |  0 |  2 246 688 |  0 |  2 487 +| **Subtotal** |   |  ​283,​075 ​|  ​30,​044 ​|  27,189 |  ​428,​621 ​|  ​278,​178 ​|  ​539,​250 ​|  11,593 |  1,676 293 
-| **Total** |  |  ​303 772 |  27 616 |  ​24 658 |  ​406 459 |  ​263 864 |  489 170 |  11 102 |  1 526 633 +| cs |  Czech |  ​106,​899 ​|  4,124 |  2,310 |  19,085 |  12,188 |  50,604 |  562 |  ​195,​771 ​
- cs  | Czech |  ​110 573 |  4 351 |  2 310 |  19 085 |  12 908 |  50 604 |  562 |  ​200 393 +| **TOTAL** |   ​|  ​389,​974 ​|  ​30,​073 ​|  ​27,​184 ​|  ​428,​482 ​|  ​277,​458 ​|  539,489 |  11,585 |  1,​704,​208 ​|
-| **TOTAL** |  |  ​414 345 |  ​31 967 |  ​26 968 |  ​425 543 |  ​276 772 |  539 774 |  11 664 |  1 727 026 |+
  
 N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart. N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart.
Line 114: Line 113:
 ^  Language ​ ^  Tags  ^  Lemmas ​ ^  Brief description ​ ^  Detailed description ​ ^  Tool  ^ ^  Language ​ ^  Tags  ^  Lemmas ​ ^  Brief description ​ ^  Detailed description ​ ^  Tool  ^
 ^ Belarusian |  ✔  |   ​✔ ​  ​| ​    ​| ​ [[http://​universaldependencies.org/​docs/​u/​pos/​index.html|in English]]%%****%%) ​ |  [[https://​web.archive.org/​web/​20170122231904/​http://​lindat.mff.cuni.cz/​services/​udpipe/​api-reference.php|UDPipe]] ​ | ^ Belarusian |  ✔  |   ​✔ ​  ​| ​    ​| ​ [[http://​universaldependencies.org/​docs/​u/​pos/​index.html|in English]]%%****%%) ​ |  [[https://​web.archive.org/​web/​20170122231904/​http://​lindat.mff.cuni.cz/​services/​udpipe/​api-reference.php|UDPipe]] ​ |
-^ Bulgarian |  ✔  |   ​✔ ​   |  ​[[https://​www.sketchengine.eu/​bulgarian-treebank-part-of-speech-tagset/​|in English]]  ​|  [[http://​bultreebank.org/​en/​resources/​short-description-dependency-part-bultreebank-bultreebank-dp/​btb-tr03-2/​|in English]] ​ |  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ |+^ Bulgarian |  ✔  |   ​✔ ​   |     ​|  [[http://​bultreebank.org/​en/​resources/​short-description-dependency-part-bultreebank-bultreebank-dp/​btb-tr03-2/​|in English]] ​ |  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ |
 ^ Catalan |  ✔  |  ✔  |  [[http://​clic.ub.edu/​corpus/​webfm_send/​18|in English]] ​ |     ​| ​ [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ | ^ Catalan |  ✔  |  ✔  |  [[http://​clic.ub.edu/​corpus/​webfm_send/​18|in English]] ​ |     ​| ​ [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ |
-^ Chinese |  ✔  |    |  [[https://​www.sketchengine.eu/​chinese-penn-treebank-part-of-speech-tagset/​|in English]] ​ |  [[https://​repository.upenn.edu/​cgi/​viewcontent.cgi?​article=1039&​context=ircs_reports|in English]] ​ |  [[https://​www.sutd.edu.sg/​cmsresource/​faculty/​yuezhang/​zpar.html|ZPar v0.7.5]] ​ | 
 ^ Croatian |  ✔  |  ✔  |   ​[[https://​github.com/​ffnlp/​sethr/​blob/​master/​mte4r-upos.mapping|in English]] ​ |     ​| ​ [[https://​github.com/​uzh/​reldi|ReLDI Tagger]] ​  | ^ Croatian |  ✔  |  ✔  |   ​[[https://​github.com/​ffnlp/​sethr/​blob/​master/​mte4r-upos.mapping|in English]] ​ |     ​| ​ [[https://​github.com/​uzh/​reldi|ReLDI Tagger]] ​  |
 ^ Czech |  ✔  |  ✔  |  [[http://​wiki.korpus.cz/​doku.php/​seznamy:​tagy|in Czech]] and [[http://​ufal.mff.cuni.cz/​pdt/​Morphology_and_Tagging/​Doc/​hmptagqr.html|English]] ​ |  [[http://​ufal.mff.cuni.cz/​pdt/​Morphology_and_Tagging/​Doc/​docc0pos.pdf|in English]] ​ |  [[http://​ufal.mff.cuni.cz/​morce/​index.php|Morče]] ​ | ^ Czech |  ✔  |  ✔  |  [[http://​wiki.korpus.cz/​doku.php/​seznamy:​tagy|in Czech]] and [[http://​ufal.mff.cuni.cz/​pdt/​Morphology_and_Tagging/​Doc/​hmptagqr.html|English]] ​ |  [[http://​ufal.mff.cuni.cz/​pdt/​Morphology_and_Tagging/​Doc/​docc0pos.pdf|in English]] ​ |  [[http://​ufal.mff.cuni.cz/​morce/​index.php|Morče]] ​ |
-^ Dutch |  ✔  |   ​✔ ​   |   ​[[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​data/​dutch-tagset.txt|in English]] ​ |   ​|  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ |+^ Dutch |  ✔  |   ​✔ ​   |   ​[[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​data/​dutch-tagset.txt|in English]] ​ |  ​[[http://​www.inl.nl/​tst-centrale/​images/​stories/​producten/​documentatie/​ehc_handleiding_nl.pdf|in Dutch]]  ​|  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ |
 ^ English |  ✔   ​| ​ ✔  |  [[http://​utkl.ff.cuni.cz/​~rosen/​INTERCORP/​TAGSETS/​PennTreebankTags.pdf|in English]] ​ | [[http://​utkl.ff.cuni.cz/​%7Erosen/​public/​Penn-Treebank-Tagset.pdf|in English]] + [[http://​utkl.ff.cuni.cz/​%7Erosen/​public/​PennTagAdd.html|additions]] ​ |  [[http://​www.ims.uni-stuttgart.de/​forschung/​ressourcen/​werkzeuge/​treetagger.html|TreeTagger]] ​ | ^ English |  ✔   ​| ​ ✔  |  [[http://​utkl.ff.cuni.cz/​~rosen/​INTERCORP/​TAGSETS/​PennTreebankTags.pdf|in English]] ​ | [[http://​utkl.ff.cuni.cz/​%7Erosen/​public/​Penn-Treebank-Tagset.pdf|in English]] + [[http://​utkl.ff.cuni.cz/​%7Erosen/​public/​PennTagAdd.html|additions]] ​ |  [[http://​www.ims.uni-stuttgart.de/​forschung/​ressourcen/​werkzeuge/​treetagger.html|TreeTagger]] ​ |
 ^ Estonian |  ✔  |  ✔  |  [[http://​www.cl.ut.ee/​korpused/​morfliides/​seletus| in Estonian and English]] ​ |     ​| ​ [[http://​http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ | ^ Estonian |  ✔  |  ✔  |  [[http://​www.cl.ut.ee/​korpused/​morfliides/​seletus| in Estonian and English]] ​ |     ​| ​ [[http://​http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ |
-^ Finnish |  ✔  |  ✔  |  [[https://​www.sketchengine.co.uk/​finntreebank/​|in English]]%%*%%) ​ |  [[http://​www.ling.helsinki.fi/​kieliteknologia/​tutkimus/​treebank/​sources/​FinnTreeBankManual.pdf|in English]]%%*%%) ​ |  [[http://​www.ling.helsinki.fi/​kieliteknologia/​tutkimus/​omor/​omorfi/​README.shtml|OMorFi]] + [[https://​code.google.com/​archive/​p/​hunpos/​|HunPOS]] ​ |+^ Finnish |  ✔  |  ✔  |  [[https://​www.sketchengine.co.uk/​finntreebank/​|in English]]%%*%%) ​ |  [[http://​www.ling.helsinki.fi/​kieliteknologia/​tutkimus/​treebank/​sources/​FinnTreeBankManual.pdf|in English]]%%*%%) ​ |  [[http://​www.ling.helsinki.fi/​kieliteknologia/​tutkimus/​omor/​omorfi/​README.shtml|OMorFi]] +[[https://​code.google.com/​archive/​p/​hunpos/​|HunPOS]] ​ |
 ^ French |  ✔  |  ✔  |  [[http://​www.ims.uni-stuttgart.de/​%7Eschmid/​french-tagset.html|in English]] ​ |     ​| ​ [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ | ^ French |  ✔  |  ✔  |  [[http://​www.ims.uni-stuttgart.de/​%7Eschmid/​french-tagset.html|in English]] ​ |     ​| ​ [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ |
 ^ German |  ✔  |  ✔  |  [[https://​www.sketchengine.co.uk/​German-rftagger-part-of-speech-tagset/​|in English]]%%**%% ​ |  [[http://​utkl.ff.cuni.cz/​%7Erosen/​public/​stts_guide.pdf|in German]] ​ |  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​RFTagger/​|RFTagger]] ​ | ^ German |  ✔  |  ✔  |  [[https://​www.sketchengine.co.uk/​German-rftagger-part-of-speech-tagset/​|in English]]%%**%% ​ |  [[http://​utkl.ff.cuni.cz/​%7Erosen/​public/​stts_guide.pdf|in German]] ​ |  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​RFTagger/​|RFTagger]] ​ |
-^ Hungarian |  ✔  |   ​ [[https://kontext.korpus.cz/wordlist/result?​wlnums=frq&​wlpat=.*&​blhash=&​include_nonwords=0&​wlsort=f&​corpname=intercorp_v12_hu&​wlattr=tag&​usesubcorp=&​wlminfreq=1&​wlhash=|List]] ​ |  [[http://www.inf.u-szeged.hu/projectdirs/hlt/​en/​Szeged%20Treebank%202.0_en.html|in English]] ​ |   ​[[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​RFTagger/​|RFTagger]] ​ |+^ Hungarian |  ✔  |     ​    ​[[http://nl.ijs.si/ME/Vault/V3/msd/html/msd.html#​SECTION05400000000000000000|in English]] ​ |  ​|  ​[[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​RFTagger/​|RFTagger]] ​ |
 ^ Icelandic |  ✔  |  ✔  |  [[http://​www.malfong.is/​files/​ot_tagset_files_en.pdf|in English]] ​  ​| ​    ​| ​ [[http://​www.ling.su.se/​english/​nlp/​tools/​stagger/​stagger-the-stockholm-tagger-1.98986|IceStagger]] ​ | ^ Icelandic |  ✔  |  ✔  |  [[http://​www.malfong.is/​files/​ot_tagset_files_en.pdf|in English]] ​  ​| ​    ​| ​ [[http://​www.ling.su.se/​english/​nlp/​tools/​stagger/​stagger-the-stockholm-tagger-1.98986|IceStagger]] ​ |
 ^ Italian |  ✔  |  ✔  |  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​data/​italian-tagset.txt|in English]] ​ |     ​| ​ [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ | ^ Italian |  ✔  |  ✔  |  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​data/​italian-tagset.txt|in English]] ​ |     ​| ​ [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ |
-^ Japanese |  ✔  |  ✔  |  [[https://​www.sketchengine.eu/​tagset-jp-mecab/​|in English]] ​  ​| ​    ​| ​ [[https://​taku910.github.io/​mecab/​|MeCab]] + [[https://​unidic.ninjal.ac.jp|Unidic]]  |+^ Japanese |  ✔  |  ✔  |  [[https://​www.sketchengine.eu/​tagset-jp-mecab/​|in English]] ​  ​| ​    ​| ​ [[https://​taku910.github.io/​mecab/​|MeCab]] ​ |
 ^ Latvian |  ✔  |  ✔  |   ​[[http://​www.semti-kamols.lv/​doc_upl/​TagSet.html|in Latvian]] ​ |     ​| ​ [[https://​peteris.rocks/​blog/​latvian-part-of-speech-tagging|LVTagger]] ​ | ^ Latvian |  ✔  |  ✔  |   ​[[http://​www.semti-kamols.lv/​doc_upl/​TagSet.html|in Latvian]] ​ |     ​| ​ [[https://​peteris.rocks/​blog/​latvian-part-of-speech-tagging|LVTagger]] ​ |
 ^ Norwegian |  ✔  |  ✔  | [[http://​tekstlab.uio.no/​obt-ny/​english/​tagset.html|in English]] and [[http://​tekstlab.uio.no/​obt-ny/​index.html|Norwegian]] ​ |     ​| ​ [[https://​visl.sdu.dk/​remoting.html|VISL]] ​ | ^ Norwegian |  ✔  |  ✔  | [[http://​tekstlab.uio.no/​obt-ny/​english/​tagset.html|in English]] and [[http://​tekstlab.uio.no/​obt-ny/​index.html|Norwegian]] ​ |     ​| ​ [[https://​visl.sdu.dk/​remoting.html|VISL]] ​ |
-^ Polish |  ✔  |  ✔  |  [[http://​nkjp.pl/​poliqarp/​help/​ense2.html#​x3-20002|in English]] and [[http://​nkjp.pl/​poliqarp/​help/​plse2.html#​x3-20002|Polish]] ​ |  [[http://​nlp.ipipan.waw.pl/​%7Eadamp/​Papers/​2003-eacl-ws12/​|in English]] ​ |  [[http://​sgjp.pl/​morfeusz/​|Morfeusz]] ​[[https://github.com/kwrobel-nlp/krnnt|KRNNT]]   ​|+^ Polish |  ✔  |  ✔  |  [[http://​nkjp.pl/​poliqarp/​help/​ense2.html#​x3-20002|in English]] and [[http://​nkjp.pl/​poliqarp/​help/​plse2.html#​x3-20002|Polish]] ​ |  [[http://​nlp.ipipan.waw.pl/​%7Eadamp/​Papers/​2003-eacl-ws12/​|in English]] ​ |  [[http://​sgjp.pl/​morfeusz/​|Morfeusz]][[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]]  |
 ^ Portuguese |  ✔  |  ✔  |  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​data/​Portuguese-Tagset.html|in Spanish]] ​ |     ​| ​ [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ | ^ Portuguese |  ✔  |  ✔  |  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​data/​Portuguese-Tagset.html|in Spanish]] ​ |     ​| ​ [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ |
 ^ Russian |  ✔  |  ✔  |  [[http://​corpus.leeds.ac.uk/​mocky/​ru-table.tab|in English]] ​ |  [[http://​nl.ijs.si/​ME/​V4/​msd/​html/​msd-ru.html|in English]]%%***%% ​ |  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ | ^ Russian |  ✔  |  ✔  |  [[http://​corpus.leeds.ac.uk/​mocky/​ru-table.tab|in English]] ​ |  [[http://​nl.ijs.si/​ME/​V4/​msd/​html/​msd-ru.html|in English]]%%***%% ​ |  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ |
-^ Slovak |  ✔  |  ✔  |  [[https://​korpus.sk/​morpho_en.html/​|in ​English]]  |  [[https://​korpus.sk/​attachments/​morpho_en/tagset-www.pdf|in Slovak]] ​ |  [[http://​conference.ui.sav.sk/​wikt2010/​papers/​01_garabik_f.pdf|Radovan Garabík, Morče]] ​ | +^ Slovak |  ✔  |  ✔  |  [[http://​korpus.sk/​morpho.html/​|in ​Slovak]]  |  [[http://​korpus.sk/​attachments/​publications/2004-garabik-gianitsova-horak-simkova-tokenizacia.pdf|in Slovak]] ​ |  [[http://​conference.ui.sav.sk/​wikt2010/​papers/​01_garabik_f.pdf|Radovan Garabík, Morče]] ​ | 
-^ Slovene |  ✔  |  ✔  |  ​[[https://​www.sketchengine.eu/​slovene-tagset-multext-east-v3/​|in English]]  ​|  [[http://​nl.ijs.si/​ME/​V4/​msd/​html/​msd-sl.html|in English]] ​ |  [[http://​nl2.ijs.si/​analyze/​|ToTaLe]] ​ | +^ Slovene |  ✔  |  ✔  |    |  [[http://​nl.ijs.si/​ME/​V4/​msd/​html/​msd-sl.html|in English]] ​ |  [[http://​nl2.ijs.si/​analyze/​|ToTaLe]] ​ | 
-^ Serbian |  ✔  |  ✔  |  ​[[https://​www.sketchengine.eu/​multext-east-serbian-part-of-speech-tagset/​|in English]]  ​|   ​[[http://​nl.ijs.si/​ME/​V4/​msd/​html/​msd-sr.html|in English]] ​ |  [[https://​github.com/​uzh/​reldi|ReLDI Tagger]] ​  |+^ Serbian |  ✔  |  ✔  |     ​|   ​[[http://​nl.ijs.si/​ME/​V4/​msd/​html/​msd-sr.html|in English]] ​ |  [[https://​github.com/​uzh/​reldi|ReLDI Tagger]] ​  |
 ^ Spanish |  ✔  |  ✔  |  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​data/​spanish-tagset.txt|in English]] ​ |     ​| ​ [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ | ^ Spanish |  ✔  |  ✔  |  [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​data/​spanish-tagset.txt|in English]] ​ |     ​| ​ [[http://​www.cis.uni-muenchen.de/​~schmid/​tools/​TreeTagger/​|TreeTagger]] ​ |
 ^ Swedish |  ✔  |  ✔  |  [[http://​spraakbanken.gu.se/​korp/​markup/​msdtags.html|in Swedish and English]] ​  ​| ​    ​| ​ [[http://​www.ling.su.se/​english/​nlp/​tools/​stagger/​stagger-the-stockholm-tagger-1.98986|Stagger]] ​ | ^ Swedish |  ✔  |  ✔  |  [[http://​spraakbanken.gu.se/​korp/​markup/​msdtags.html|in Swedish and English]] ​  ​| ​    ​| ​ [[http://​www.ling.su.se/​english/​nlp/​tools/​stagger/​stagger-the-stockholm-tagger-1.98986|Stagger]] ​ |
-^ Ukrainian |  ✔  |  ✔  ​| ​ ​| ​ [[http://​universaldependencies.org/​docs/​u/​pos/​index.html|in English]]%%****%%) ​  ​|  [[https://​web.archive.org/​web/​20170122231904/​http://​lindat.mff.cuni.cz/​services/​udpipe/​api-reference.php|UDPipe]] ​ |+^ Ukrainian |  ✔  |  ✔  |  [[http://​universaldependencies.org/​docs/​u/​pos/​index.html|in English]]%%****%%) ​   ​| ​    |  [[https://​web.archive.org/​web/​20170122231904/​http://​lindat.mff.cuni.cz/​services/​udpipe/​api-reference.php|UDPipe]] ​ |
  
  
Line 220: Line 218:
   * [[http://​ufal.mff.cuni.cz/​morfflex|MorfFlex]],​ [[http://​ufal.mff.cuni.cz/​morce/​index.php|Morče]] and [[https://​is.cuni.cz/​webapps/​zzp/​download/​140018093/?​back_id=10|LanGr]] for Czech   * [[http://​ufal.mff.cuni.cz/​morfflex|MorfFlex]],​ [[http://​ufal.mff.cuni.cz/​morce/​index.php|Morče]] and [[https://​is.cuni.cz/​webapps/​zzp/​download/​140018093/?​back_id=10|LanGr]] for Czech
   * [[http://​www.ims.uni-stuttgart.de/​forschung/​ressourcen/​werkzeuge/​treetagger.html|TreeTagger]] for Bulgarian, Dutch, English, Estonian (thanks to Helmut Schmid), French, Italian, Portuguese (thanks to Pablo Gamallo), Russian and Spanish ​   * [[http://​www.ims.uni-stuttgart.de/​forschung/​ressourcen/​werkzeuge/​treetagger.html|TreeTagger]] for Bulgarian, Dutch, English, Estonian (thanks to Helmut Schmid), French, Italian, Portuguese (thanks to Pablo Gamallo), Russian and Spanish ​
-  * [[http://​sgjp.pl/​morfeusz/​|Morfeusz]] and [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] for Polish+  * [[http://​sgjp.pl/​morfeusz/​|Morfeusz]] and [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]] for Polish
   * [[http://​code.google.com/​p/​hunpos/​|HunPOS]] for Hungarian and other languages   * [[http://​code.google.com/​p/​hunpos/​|HunPOS]] for Hungarian and other languages
   * [[http://​conference.ui.sav.sk/​wikt2010/​papers/​01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík)   * [[http://​conference.ui.sav.sk/​wikt2010/​papers/​01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík)
Line 231: Line 229:
   * [[https://​peteris.rocks/​blog/​latvian-part-of-speech-tagging/​|LVTagger]] for Latvian (thanks to Pēteris Paikens and Michal Škrabal)   * [[https://​peteris.rocks/​blog/​latvian-part-of-speech-tagging/​|LVTagger]] for Latvian (thanks to Pēteris Paikens and Michal Škrabal)
   * [[http://​ufal.mff.cuni.cz/​udpipe|UD Pipe]] for Belarusian and Ukrainian (thanks to Bohdan Moskalevskyi)   * [[http://​ufal.mff.cuni.cz/​udpipe|UD Pipe]] for Belarusian and Ukrainian (thanks to Bohdan Moskalevskyi)
-  * [[https://​taku910.github.io/​mecab/​|MeCab]] and [[https://​osdn.net/​projects/​unidic/​|Unidic]] for Japanese ​(thanks to Adam Nohejl) +  * [[https://​taku910.github.io/​mecab/​|MeCab]] and [[https://​osdn.net/​projects/​unidic/​|Unidic]] for Japanese
-  * [[https://​www.sutd.edu.sg/​cmsresource/​faculty/​yuezhang/​zpar.html|ZPar]] for Chinese (thanks to Vlastimil Dobečka)+
  
  
Line 239: Line 236:
  
 <WRAP round box 51%> <WRAP round box 51%>
-[[en:​cnk:​intercorp|InterCorp]] • [[en:​cnk:​intercorp:​verze11|Version 11]] • [[en:​cnk:​intercorp:​verze10|Version 10]] • [[en:​cnk:​intercorp:​verze9|Version 9]] • [[en:​cnk:​intercorp:​verze8|Version 8]] • [[en:​cnk:​intercorp:​verze7|Version 7]] • [[en:​cnk:​intercorp:​verze6|Version 6]] • [[en:​cnk:​intercorp:​verze5|Version 5]] • [[en:​cnk:​intercorp:​verze4|Verze 4]] • [[en:​cnk:​intercorp:​verze3|Version 3]] • [[en:​cnk:​intercorp:​historie|Version history]]+[[en:​cnk:​intercorp|InterCorp]] • [[en:​cnk:​intercorp:​verze10|Version 10]] • [[en:​cnk:​intercorp:​verze9|Version 9]] • [[en:​cnk:​intercorp:​verze8|Version 8]] • [[en:​cnk:​intercorp:​verze7|Version 7]] • [[en:​cnk:​intercorp:​verze6|Version 6]] • [[en:​cnk:​intercorp:​verze5|Version 5]] • [[en:​cnk:​intercorp:​verze4|Verze 4]] • [[en:​cnk:​intercorp:​verze3|Version 3]] • [[en:​cnk:​intercorp:​historie|Version history]]
  
-See [[https://intercorp.korpus.cz/?​lang=en|the original InterCorp site in English]].+See [[http://ucnk.ff.cuni.cz/intercorp/?​lang=en|the original InterCorp site in English]].
 </​WRAP>​ </​WRAP>​