Both sides previous revisionPrevious revision | |
en:cnk:intercorp:verze11 [2019/12/20 00:22] – [InterCorp Release 12] alexandrrosen | en:cnk:intercorp:verze11 [2019/12/20 11:11] (current) – old revision restored (2019/11/07 23:10) michalkren |
---|
~~NOTOC~~ | ~~NOTOC~~ |
====== InterCorp Release 12 ====== | ====== InterCorp Release 11 ====== |
| |
^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ | ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ |
^ Positions ^ Number of tokens | 137 059 021 | 116 673 027 | 373 873 819 | 1 549 570 665 | | ^ Positions ^ Number of tokens | 132,508,429 | 115,574,528 | 340,554,768 | 1,550,923,096 | |
^ ::: ^ Number of word forms | 110 588 784 | 89 819 765 | 310 914 295 | 1 222 868 666 | | ^ ::: ^ Number of word forms | 106,898,538 | 88,872,779 | 283,075,338 | 1,225,361,750 | |
^ Structural attributes ^ Number of documents | 1 619 | 30 | 3 806 | 281 | | ^ Structural attributes ^ Number of documents | 1,564 | 28 | 3,494 | 261 | |
^ ::: ^ Number of texts | 1 619 | 111 951 | 3 806 | 1 843 489 | | ^ ::: ^ Number of texts | 1,507 | 111,672 | 3,232 | 1,841,341 | |
^ ::: ^ Number of sentences | 9 518 229 | 13 606 183 | 23 076 128 | 143 165 959 | | ^ ::: ^ Number of sentences | 9,193,433 | 13,556,382 | 21,000,997 | 142,734,659 | |
^ Further information ^ reference | YES ^^^^ | ^ Further information ^ reference | YES ^^^^ |
^ ::: ^ representative | NO ^^^^ | ^ ::: ^ representative | NO ^^^^ |
^ ::: ^ publication date | 2019 ^^^^ | ^ ::: ^ publication date | 2018 ^^^^ |
^ ::: ^ foreign languages | 40 ^^^^ | ^ ::: ^ foreign languages | 39 ^^^^ |
^ ::: ^ tagged languages | 26 ^^^^ | ^ ::: ^ tagged languages | 26 ^^^^ |
^ ::: ^ lemmatized languages | 25 ^^^^ | ^ ::: ^ lemmatized languages | 25 ^^^^ |
| |
| |
[{{:cnk:intercorp:intercorp_wordcounts_v12.png|Setup of the parallel corpus – the core and collections}}] | [{{:cnk:intercorp:intercorp_wordcounts_v11.png|Setup of the parallel corpus – the core and collections}}] |
| |
[{{:cnk:intercorp:intercorp_wordcounts2_v12.png|Setup of the parallel corpus – the core}}] | [{{:cnk:intercorp:intercorp_wordcounts2_v11.png|Setup of the parallel corpus – the core}}] |
| |
[{{:cnk:intercorp:intercorp_wordcounts3_v12.png|Setup of the parallel corpus – collections}}] | [{{:cnk:intercorp:intercorp_wordcounts3_v11.png|Setup of the parallel corpus – collections}}] |
| |
===== Corpus size in thousands of words ===== | ===== Corpus size in thousands of words ===== |
| |
^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Bible ^ Total ^ | ^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Bible ^ Total ^ |
| ar | Arabic | 34 | 0 | 0 | 0 | 0 | 0 | 0 | 34 | | | ar | Arabic | 34 | 0 | 0 | 0 | 0 | 0 | 0 | 34 | |
| be | Belarusian | 5 319 | 0 | 0 | 0 | 0 | 0 | 0 | 5 319 | | | be | Belarusian | 4,426 | 0 | 0 | 0 | 0 | 0 | 0 | 4,426 | |
| bg | Bulgarian | 7 068 | 0 | 0 | 13 577 | 9 083 | 0 | 0 | 29 728 | | | bg | Bulgarian | 6,780 | 0 | 0 | 13,577 | 9,083 | 0 | 0 | 29,441 | |
| ca | Catalan | 7 481 | 0 | 0 | 0 | 0 | 0 | 736 | 8 217 | | | ca | Catalan | 5,596 | 0 | 0 | 0 | 0 | 0 | 736 | 6,332 | |
| da | Danish | 6 654 | 0 | 0 | 20 313 | 13 916 | 14 429 | 657 | 55 968 | | | da | Danish | 5,595 | 0 | 0 | 20,313 | 13,916 | 14,429 | 657 | 54,910 | |
| de | German | 36 373 | 4 704 | 2 483 | 20 610 | 13 088 | 8 392 | 724 | 86 374 | | | de | German | 34,915 | 4,457 | 2,483 | 20,610 | 13,088 | 8,392 | 724 | 84,669 | |
| el | Greek | 0 | 0 | 0 | 23 853 | 15 404 | 23 709 | 0 | 62 966 | | | el | Greek | 0 | 0 | 0 | 23,853 | 15,404 | 23,709 | 0 | 62,966 | |
| en | English | 32 152 | 4 856 | 2 670 | 22 902 | 15 576 | 52 105 | 730 | 130 992 | | | en | English | 27,968 | 4,604 | 2,670 | 22,902 | 15,576 | 52,105 | 730 | 126,555 | |
| es | Spanish | 25 595 | 5 614 | 2 859 | 26 262 | 16 249 | 36 650 | 0 | 113 228 | | | es | Spanish | 23,349 | 5,322 | 2,859 | 26,262 | 16,249 | 36,650 | 0 | 110,691 | |
| et | Estonian | 0 | 0 | 0 | 14 896 | 10 899 | 10 298 | 0 | 36 093 | | | et | Estonian | 0 | 0 | 0 | 14,896 | 10,899 | 10,298 | 0 | 36,093 | |
| fi | Finnish | 5 329 | 0 | 0 | 15 269 | 10 108 | 15 047 | 543 | 46 296 | | | fi | Finnish | 4,585 | 0 | 0 | 15,489 | 10,175 | 15,098 | 544 | 45,890 | |
| fr | French | 18 241 | 5 600 | 3 046 | 26 200 | 17 179 | 25 986 | 764 | 97 016 | | | fr | French | 17,213 | 5,391 | 3,046 | 26,200 | 17,179 | 25,986 | 764 | 95,779 | |
| he | Hebrew | 0 | 0 | 0 | 0 | 0 | 16 221 | 0 | 16 221 | | | he | Hebrew | 0 | 0 | 0 | 0 | 0 | 16,221 | 0 | 16,221 | |
| hi | Hindi | 409 | 0 | 0 | 0 | 0 | 0 | 0 | 409 | | | hi | Hindu | 409 | 0 | 0 | 0 | 0 | 0 | 0 | 409 | |
| hr | Croatian | 21 027 | 0 | 0 | 0 | 0 | 19 048 | 571 | 40 646 | | | hr | Croatian | 20,147 | 0 | 0 | 0 | 0 | 19,048 | 571 | 39,767 | |
| hu | Hungarian | 5 783 | 0 | 0 | 17 852 | 12 198 | 21 115 | 0 | 56 948 | | | hu | Hungarian | 5 783 | 0 | 0 | 17 852 | 12 198 | 21 115 | 0 | 56 948 | |
| is | Icelandic | 0 | 0 | 0 | 0 | 0 | 1 581 | 0 | 1 581 | | | is | Icelandic | 0 | 0 | 0 | 0 | 0 | 1,581 | 0 | 1,581 | |
| it | Italian | 13 251 | 1 252 | 2 747 | 23 771 | 15 494 | 14 700 | 684 | 71 899 | | | it | Italian | 11,400 | 1,141 | 2,747 | 23,771 | 15,494 | 14,700 | 684 | 69,937 | |
| ja | Japanese | 1 747 | 0 | 0 | 0 | 0 | 477 | 0 | 2 224 | | | ja | Japanese | 1,198 | 0 | 0 | 0 | 0 | 477 | 0 | 1,675 | |
| lt | Lithuanian | 421 | 0 | 0 | 17 316 | 11 213 | 558 | 471 | 29 979 | | | lt | Lithuanian | 287 | 0 | 0 | 17,316 | 11,213 | 558 | 471 | 29,844 | |
| lv | Latvian | 2 646 | 0 | 0 | 17 522 | 11 682 | 280 | 135 | 32 265 | | | lv | Latvian | 2,523 | 0 | 0 | 17,522 | 11,682 | 280 | 0 | 32,008 | |
| mk | Macedonian | 8 000 | 0 | 0 | 0 | 0 | 1 877 | 0 | 9 877 | | | mk | Macedonian | 6,508 | 0 | 0 | 0 | 0 | 1,877 | 0 | 8,385 | |
| ms | Malay | 0 | 0 | 0 | 0 | 0 | 3 521 | 0 | 3 521 | | | ms | Malay | 0 | 0 | 0 | 0 | 0 | 3,521 | 0 | 3,521 | |
| mt | Maltese | 0 | 0 | 0 | 13 953 | 0 | 0 | 0 | 13 953 | | | mt | Maltese | 0 | 0 | 0 | 13,953 | 0 | 0 | 0 | 13,953 | |
| nl | Dutch | 15 127 | 813 | 2 953 | 23 416 | 15 558 | 29 373 | 717 | 87 956 | | | nl | Dutch | 13,689 | 711 | 2,953 | 23,416 | 15,558 | 29,373 | 717 | 86,416 | |
| no | Norwegian | 7 151 | 0 | 0 | 0 | 0 | 0 | 721 | 7 872 | | | no | Norwegian | 6,675 | 0 | 0 | 0 | 0 | 0 | 721 | 7,397 | |
| pl | Polish | 25 606 | 0 | 2 380 | 19 604 | 12 817 | 26 575 | 583 | 87 567 | | | pl | Polish | 24,292 | 0 | 2,378 | 19,594 | 12,811 | 26,572 | 583 | 86,230 | |
| pt | Portuguese | 4 095 | 554 | 2 782 | 24 598 | 15 193 | 41 468 | 706 | 89 396 | | | pt | Portuguese | 4,032 | 520 | 3,000 | 27,301 | 16,485 | 43,392 | 760 | 95,489 | |
| rn | Romani | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | | | rn | Romani | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | |
| ro | Romanian | 3 888 | 0 | 2 738 | 8 092 | 9 446 | 34 128 | 0 | 58 292 | | | ro | Romanian | 3,888 | 0 | 2,738 | 8,092 | 9,446 | 34,128 | 0 | 58,292 | |
| ru | Russian | 8 123 | 3 984 | 0 | 0 | 0 | 6 887 | 565 | 19 560 | | | ru | Russian | 7,062 | 3,768 | 0 | 0 | 0 | 6,887 | 565 | 18,282 | |
| sk | Slovak | 8 543 | 0 | 0 | 18 399 | 12 726 | 5 133 | 561 | 45 363 | | | sk | Slovak | 8,545 | 0 | 0 | 18,401 | 12,734 | 5,134 | 561 | 45,376 | |
| sl | Slovene | 3 740 | 0 | 0 | 18 528 | 12 251 | 17 061 | 0 | 51 580 | | | sl | Slovenian | 3,534 | 0 | 0 | 18,485 | 12,241 | 17,023 | 0 | 51,282 | |
| sq | Albanian | 0 | 0 | 0 | 0 | 0 | 2 003 | 0 | 2 003 | | | sq | Albanian | 0 | 0 | 0 | 0 | 0 | 2,003 | 0 | 2,003 | |
| sr | Serbian | 10 961 | 0 | 0 | 0 | 0 | 20 727 | 0 | 31 688 | | | sr | Serbian | 10,661 | 0 | 0 | 0 | 0 | 20,727 | 0 | 31,388 | |
| sv | Swedish | 15 320 | 0 | 0 | 19 542 | 13 784 | 14 666 | 638 | 63 950 | | | sv | Swedish | 12,396 | 0 | 0 | 19,609 | 13,840 | 14,694 | 638 | 61,178 | |
| tr | Turkish | 0 | 0 | 0 | 0 | 0 | 21 190 | 0 | 21 190 | | | tr | Turkish | 0 | 0 | 0 | 0 | 0 | 21,190 | 0 | 21,190 | |
| uk | Ukrainian | 10 817 | 0 | 0 | 0 | 0 | 244 | 596 | 11 657 | | | uk | Ukrainian | 9,571 | 0 | 0 | 0 | 0 | 245 | 596 | 10,411 | |
| vi | Vietnamese | 0 | 0 | 0 | 0 | 0 | 1 474 | 0 | 1 474 | | | vi | Vietnamese | 0 | 0 | 0 | 0 | 0 | 1,474 | 0 | 1,474 | |
| zh | Chinese | 0 | 240 | 0 | 0 | 0 | 2 246 688 | 0 | 2 487 | | | **Subtotal** | | 283,075 | 30,044 | 27,189 | 428,621 | 278,178 | 539,250 | 11,593 | 1,676 293 | |
| **Total** | | 303 772 | 27 616 | 24 658 | 406 459 | 263 864 | 489 170 | 11 102 | 1 526 633 | | | cs | Czech | 106,899 | 4,124 | 2,310 | 19,085 | 12,188 | 50,604 | 562 | 195,771 | |
| cs | Czech | 110 573 | 4 351 | 2 310 | 19 085 | 12 908 | 50 604 | 562 | 200 393 | | | **TOTAL** | | 389,974 | 30,073 | 27,184 | 428,482 | 277,458 | 539,489 | 11,585 | 1,704,208 | |
| **TOTAL** | | 414 345 | 31 967 | 26 968 | 425 543 | 276 772 | 539 774 | 11 664 | 1 727 026 | | |
| |
N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart. | N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart. |
^ Language ^ Tags ^ Lemmas ^ Brief description ^ Detailed description ^ Tool ^ | ^ Language ^ Tags ^ Lemmas ^ Brief description ^ Detailed description ^ Tool ^ |
^ Belarusian | ✔ | ✔ | | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]] | | ^ Belarusian | ✔ | ✔ | | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]] | |
^ Bulgarian | ✔ | ✔ | [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]] | [[http://bultreebank.org/en/resources/short-description-dependency-part-bultreebank-bultreebank-dp/btb-tr03-2/|in English]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Bulgarian | ✔ | ✔ | | [[http://bultreebank.org/en/resources/short-description-dependency-part-bultreebank-bultreebank-dp/btb-tr03-2/|in English]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Catalan | ✔ | ✔ | [[http://clic.ub.edu/corpus/webfm_send/18|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Catalan | ✔ | ✔ | [[http://clic.ub.edu/corpus/webfm_send/18|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Chinese | ✔ | | [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]] | [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]] | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]] | | |
^ Croatian | ✔ | ✔ | [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]] | | [[https://github.com/uzh/reldi|ReLDI Tagger]] | | ^ Croatian | ✔ | ✔ | [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]] | | [[https://github.com/uzh/reldi|ReLDI Tagger]] | |
^ Czech | ✔ | ✔ | [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] | [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]] | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] | | ^ Czech | ✔ | ✔ | [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] | [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]] | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] | |
^ Dutch | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Dutch | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]] | [[http://www.inl.nl/tst-centrale/images/stories/producten/documentatie/ehc_handleiding_nl.pdf|in Dutch]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ English | ✔ | ✔ | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]] | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]] | [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] | | ^ English | ✔ | ✔ | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]] | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]] | [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] | |
^ Estonian | ✔ | ✔ | [[http://www.cl.ut.ee/korpused/morfliides/seletus| in Estonian and English]] | | [[http://http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Estonian | ✔ | ✔ | [[http://www.cl.ut.ee/korpused/morfliides/seletus| in Estonian and English]] | | [[http://http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Finnish | ✔ | ✔ | [[https://www.sketchengine.co.uk/finntreebank/|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] + [[https://code.google.com/archive/p/hunpos/|HunPOS]] | | ^ Finnish | ✔ | ✔ | [[https://www.sketchengine.co.uk/finntreebank/|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] | |
^ French | ✔ | ✔ | [[http://www.ims.uni-stuttgart.de/%7Eschmid/french-tagset.html|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ French | ✔ | ✔ | [[http://www.ims.uni-stuttgart.de/%7Eschmid/french-tagset.html|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ German | ✔ | ✔ | [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]]%%**%% | [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | | ^ German | ✔ | ✔ | [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]]%%**%% | [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | |
^ Hungarian | ✔ | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v12_hu&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=|List]] | [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | | ^ Hungarian | ✔ | | [[http://nl.ijs.si/ME/Vault/V3/msd/html/msd.html#SECTION05400000000000000000|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | |
^ Icelandic | ✔ | ✔ | [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]] | | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] | | ^ Icelandic | ✔ | ✔ | [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]] | | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] | |
^ Italian | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Italian | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Japanese | ✔ | ✔ | [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]] | | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] | | ^ Japanese | ✔ | ✔ | [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]] | | [[https://taku910.github.io/mecab/|MeCab]] | |
^ Latvian | ✔ | ✔ | [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]] | | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] | | ^ Latvian | ✔ | ✔ | [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]] | | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] | |
^ Norwegian | ✔ | ✔ | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]] | | [[https://visl.sdu.dk/remoting.html|VISL]] | | ^ Norwegian | ✔ | ✔ | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]] | | [[https://visl.sdu.dk/remoting.html|VISL]] | |
^ Polish | ✔ | ✔ | [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]] | [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]] | [[http://sgjp.pl/morfeusz/|Morfeusz]] + [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] | | ^ Polish | ✔ | ✔ | [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]] | [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]] | [[http://sgjp.pl/morfeusz/|Morfeusz]], [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]] | |
^ Portuguese | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Portuguese | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Russian | ✔ | ✔ | [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]]%%***%% | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Russian | ✔ | ✔ | [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]]%%***%% | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Slovak | ✔ | ✔ | [[https://korpus.sk/morpho_en.html/|in English]] | [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]] | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] | | ^ Slovak | ✔ | ✔ | [[http://korpus.sk/morpho.html/|in Slovak]] | [[http://korpus.sk/attachments/publications/2004-garabik-gianitsova-horak-simkova-tokenizacia.pdf|in Slovak]] | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] | |
^ Slovene | ✔ | ✔ | [[https://www.sketchengine.eu/slovene-tagset-multext-east-v3/|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-sl.html|in English]] | [[http://nl2.ijs.si/analyze/|ToTaLe]] | | ^ Slovene | ✔ | ✔ | | [[http://nl.ijs.si/ME/V4/msd/html/msd-sl.html|in English]] | [[http://nl2.ijs.si/analyze/|ToTaLe]] | |
^ Serbian | ✔ | ✔ | [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]] | [[https://github.com/uzh/reldi|ReLDI Tagger]] | | ^ Serbian | ✔ | ✔ | | [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]] | [[https://github.com/uzh/reldi|ReLDI Tagger]] | |
^ Spanish | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Spanish | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Swedish | ✔ | ✔ | [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]] | | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] | | ^ Swedish | ✔ | ✔ | [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]] | | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] | |
^ Ukrainian | ✔ | ✔ | | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]] | | ^ Ukrainian | ✔ | ✔ | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | | [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]] | |
| |
| |
* [[http://ufal.mff.cuni.cz/morfflex|MorfFlex]], [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] and [[https://is.cuni.cz/webapps/zzp/download/140018093/?back_id=10|LanGr]] for Czech | * [[http://ufal.mff.cuni.cz/morfflex|MorfFlex]], [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] and [[https://is.cuni.cz/webapps/zzp/download/140018093/?back_id=10|LanGr]] for Czech |
* [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] for Bulgarian, Dutch, English, Estonian (thanks to Helmut Schmid), French, Italian, Portuguese (thanks to Pablo Gamallo), Russian and Spanish | * [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] for Bulgarian, Dutch, English, Estonian (thanks to Helmut Schmid), French, Italian, Portuguese (thanks to Pablo Gamallo), Russian and Spanish |
* [[http://sgjp.pl/morfeusz/|Morfeusz]] and [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] for Polish | * [[http://sgjp.pl/morfeusz/|Morfeusz]] and [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]] for Polish |
* [[http://code.google.com/p/hunpos/|HunPOS]] for Hungarian and other languages | * [[http://code.google.com/p/hunpos/|HunPOS]] for Hungarian and other languages |
* [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík) | * [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík) |
* [[https://peteris.rocks/blog/latvian-part-of-speech-tagging/|LVTagger]] for Latvian (thanks to Pēteris Paikens and Michal Škrabal) | * [[https://peteris.rocks/blog/latvian-part-of-speech-tagging/|LVTagger]] for Latvian (thanks to Pēteris Paikens and Michal Škrabal) |
* [[http://ufal.mff.cuni.cz/udpipe|UD Pipe]] for Belarusian and Ukrainian (thanks to Bohdan Moskalevskyi) | * [[http://ufal.mff.cuni.cz/udpipe|UD Pipe]] for Belarusian and Ukrainian (thanks to Bohdan Moskalevskyi) |
* [[https://taku910.github.io/mecab/|MeCab]] and [[https://osdn.net/projects/unidic/|Unidic]] for Japanese (thanks to Adam Nohejl) | * [[https://taku910.github.io/mecab/|MeCab]] and [[https://osdn.net/projects/unidic/|Unidic]] for Japanese |
* [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar]] for Chinese (thanks to Vlastimil Dobečka) | |
| |
| |
| |
<WRAP round box 51%> | <WRAP round box 51%> |
[[en:cnk:intercorp|InterCorp]] • [[en:cnk:intercorp:verze11|Version 11]] • [[en:cnk:intercorp:verze10|Version 10]] • [[en:cnk:intercorp:verze9|Version 9]] • [[en:cnk:intercorp:verze8|Version 8]] • [[en:cnk:intercorp:verze7|Version 7]] • [[en:cnk:intercorp:verze6|Version 6]] • [[en:cnk:intercorp:verze5|Version 5]] • [[en:cnk:intercorp:verze4|Verze 4]] • [[en:cnk:intercorp:verze3|Version 3]] • [[en:cnk:intercorp:historie|Version history]] | [[en:cnk:intercorp|InterCorp]] • [[en:cnk:intercorp:verze10|Version 10]] • [[en:cnk:intercorp:verze9|Version 9]] • [[en:cnk:intercorp:verze8|Version 8]] • [[en:cnk:intercorp:verze7|Version 7]] • [[en:cnk:intercorp:verze6|Version 6]] • [[en:cnk:intercorp:verze5|Version 5]] • [[en:cnk:intercorp:verze4|Verze 4]] • [[en:cnk:intercorp:verze3|Version 3]] • [[en:cnk:intercorp:historie|Version history]] |
| |
See [[https://intercorp.korpus.cz/?lang=en|the original InterCorp site in English]]. | See [[http://ucnk.ff.cuni.cz/intercorp/?lang=en|the original InterCorp site in English]]. |
</WRAP> | </WRAP> |
| |