Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:cnk:intercorp:verze16 [2023/10/11 16:19] – [Texts in the corpus] alexandrrosen | en:cnk:intercorp:verze16 [2024/08/14 11:49] (current) – [How to cite] alexandrrosen |
---|
| |
^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Bible ^ Total ^ | ^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Bible ^ Total ^ |
^ ar ^ Arabic | 34 | 384 | 0 | 0 | 0 | 0 | 0 | 418 | | ^ af ^ Afrikaans | 0 | 0 | 0 | 0 | 0 | 136 | 0 | 136 | |
^ be ^ Belarusian | 6 524 | 0 | 0 | 0 | 0 | 0 | 0 | 6 524 | | ^ ar ^ Arabic | 34 | 384 | 0 | 0 | 0 | 126 157 | 0 | 126 576 | |
^ bg ^ Bulgarian | 7 068 | 0 | 0 | 13 577 | 9 083 | 0 | 0 | 29 728 | | ^ be ^ Belarusian | 7 131 | 0 | 0 | 0 | 0 | 0 | 0 | 7 131 | |
^ ca ^ Catalan | 8 920 | 0 | 0 | 0 | 0 | 0 | 736 | 9 656 | | ^ bg ^ Bulgarian | 7 068 | 0 | 0 | 13 577 | 9 083 | 165 092 | 0 | 194 820 | |
^ da ^ Danish | 8 456 | 0 | 0 | 20 313 | 13 916 | 14 429 | 657 | 57 770 | | ^ bn ^ Bengali | 0 | 0 | 0 | 0 | 0 | 1 554 | 0 | 1 554 | |
^ de ^ German | 39 412 | 5 067 | 2 483 | 20 610 | 13 088 | 8 392 | 724 | 89 776 | | ^ br ^ Breton | 0 | 0 | 0 | 0 | 0 | 98 | 0 | 98 | |
^ el ^ Greek | 0 | 0 | 0 | 23 853 | 15 404 | 23 709 | 0 | 62 966 | | ^ bs ^ Bosnian | 0 | 0 | 0 | 0 | 0 | 58 758 | 0 | 58 758 | |
^ en ^ English | 38 706 | 5 273 | 2 670 | 22 902 | 15 576 | 52 106 | 730 | 137 964 | | ^ ca ^ Catalan | 10 112 | 0 | 0 | 0 | 0 | 2 735 | 736 | 13 582 | |
^ es ^ Spanish | 29 145 | 6 074 | 2 859 | 26 262 | 16 249 | 36 650 | 0 | 117 239 | | ^ cs ^ Czech | 124 680 | 4 717 | 2 312 | 19 214 | 12 917 | 233 139 | 563 | 397 542 | |
^ et ^ Estonian | 0 | 0 | 0 | 14 896 | 10 899 | 10 298 | 0 | 36 093 | | ^ da ^ Danish | 9 548 | 0 | 0 | 20 313 | 13 916 | 71 825 | 657 | 116 259 | |
^ fi ^ Finnish | 6 674 | 0 | 0 | 15 269 | 10 108 | 15 047 | 543 | 47 641 | | ^ de ^ German | 40 679 | 5 067 | 2 483 | 20 610 | 13 089 | 98 566 | 724 | 181 219 | |
^ fr ^ French | 21 996 | 5 896 | 3 046 | 26 200 | 17 179 | 25 986 | 764 | 101 067 | | ^ el ^ Greek | 0 | 0 | 0 | 23 853 | 15 404 | 162 561 | 0 | 201 818 | |
^ he ^ Hebrew | 0 | 0 | 0 | 0 | 0 | 16 221 | 0 | 16 221 | | ^ en ^ English | 42 395 | 5 273 | 2 670 | 22 902 | 15 576 | 280 335 | 730 | 369 882 | |
^ hi ^ Hindi | 409 | 0 | 0 | 0 | 0 | 0 | 0 | 409 | | ^ eo ^ Esperanto | 0 | 0 | 0 | 0 | 0 | 226 | 0 | 226 | |
^ hr ^ Croatian | 23 351 | 0 | 0 | 0 | 0 | 19 048 | 571 | 42 971 | | ^ es ^ Spanish | 30 661 | 6 074 | 2 859 | 26 262 | 16 249 | 223 134 | 0 | 305 240 | |
^ hs ^ Upper | 128 | 0 | 0 | 0 | 0 | 0 | 0 | 128 | | ^ et ^ Estonian | 79 | 0 | 0 | 14 896 | 10 899 | 54 514 | 0 | 80 388 | |
^ hu ^ Hungarian | 6 922 | 8 | 0 | 17 852 | 12 198 | 21 115 | 0 | 58 095 | | ^ eu ^ Basque | 0 | 0 | 0 | 0 | 0 | 3 022 | 0 | 3 022 | |
^ is ^ Icelandic | 0 | 0 | 0 | 0 | 0 | 1 581 | 0 | 1 581 | | ^ fa ^ Persian | 0 | 0 | 0 | 0 | 0 | 33 167 | 0 | 33 167 | |
^ it ^ Italian | 16 384 | 1 389 | 2 747 | 23 771 | 15 494 | 14 700 | 684 | 75 169 | | ^ fi ^ Finnish | 6 959 | 0 | 0 | 15 269 | 10 108 | 90 471 | 543 | 123 349 | |
^ ja ^ Japanese | 3 491 | 2 | 0 | 0 | 0 | 477 | 0 | 3 970 | | ^ fr ^ French | 24 361 | 5 896 | 3 046 | 26 200 | 17 179 | 181 433 | 764 | 258 879 | |
^ lt ^ Lithuanian | 502 | 0 | 0 | 17 316 | 11 213 | 558 | 471 | 30 059 | | ^ gl ^ Galician | 0 | 0 | 0 | 0 | 0 | 623 | 0 | 623 | |
^ lv ^ Latvian | 3 437 | 0 | 0 | 17 522 | 11 682 | 280 | 537 | 33 458 | | ^ he ^ Hebrew | 0 | 0 | 0 | 0 | 0 | 130 143 | 0 | 130 143 | |
^ mk ^ Macedonian | 8 881 | 0 | 0 | 0 | 0 | 1 877 | 0 | 10 758 | | ^ hi ^ Hindi | 409 | 0 | 0 | 0 | 0 | 432 | 0 | 841 | |
^ ms ^ Malay | 0 | 0 | 0 | 0 | 0 | 3 521 | 0 | 3 521 | | ^ hr ^ Croatian | 24 529 | 0 | 0 | 0 | 0 | 137 966 | 571 | 163 066 | |
| ^ hs ^ Upper Sorbian | 466 | 0 | 0 | 0 | 0 | 0 | 0 | 466 | |
| ^ hu ^ Hungarian | 6 921 | 8 | 0 | 17 852 | 12 198 | 141 691 | 0 | 178 670 | |
| ^ hy ^ Armenian | 0 | 0 | 0 | 0 | 0 | 24 | 0 | 24 | |
| ^ id ^ Indonesian | 0 | 0 | 0 | 0 | 0 | 38 343 | 0 | 38 343 | |
| ^ is ^ Icelandic | 0 | 0 | 0 | 0 | 0 | 7 375 | 0 | 7 375 | |
| ^ it ^ Italian | 18 086 | 1 389 | 2 747 | 23 771 | 15 494 | 163 622 | 684 | 225 793 | |
| ^ ja ^ Japanese | 3 818 | 2 | 0 | 0 | 0 | 12 485 | 0 | 16 305 | |
| ^ ka ^ Georgian | 0 | 0 | 0 | 0 | 0 | 889 | 0 | 889 | |
| ^ kk ^ Kazakh | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 14 | |
| ^ ko ^ Korean | 0 | 0 | 0 | 0 | 0 | 5 980 | 0 | 5 980 | |
| ^ lt ^ Lithuanian | 696 | 0 | 0 | 17 316 | 11 213 | 5 269 | 471 | 34 964 | |
| ^ lv ^ Latvian | 3 636 | 0 | 0 | 17 533 | 11 682 | 2 053 | 537 | 35 441 | |
| ^ mk ^ Macedonian | 8 881 | 0 | 0 | 0 | 0 | 15 595 | 0 | 24 476 | |
| ^ ml ^ Malayalam | 0 | 0 | 0 | 0 | 0 | 1 281 | 0 | 1 281 | |
| ^ ms ^ Malay | 0 | 0 | 0 | 0 | 0 | 7 939 | 0 | 7 939 | |
^ mt ^ Maltese | 0 | 0 | 0 | 13 935 | 0 | 0 | 0 | 13 935 | | ^ mt ^ Maltese | 0 | 0 | 0 | 13 935 | 0 | 0 | 0 | 13 935 | |
^ nl ^ Dutch | 17 769 | 812 | 2 953 | 23 416 | 15 558 | 29 373 | 717 | 90 598 | | ^ nl ^ Dutch | 18 782 | 812 | 2 953 | 23 416 | 15 558 | 170 979 | 717 | 233 217 | |
^ no ^ Norwegian | 7 851 | 0 | 0 | 0 | 0 | 0 | 724 | 8 575 | | ^ no ^ Norwegian | 8 221 | 0 | 0 | 0 | 0 | 39 807 | 724 | 48 752 | |
^ pl ^ Polish | 28 112 | 0 | 2 380 | 19 604 | 12 817 | 26 576 | 583 | 90 072 | | ^ pl ^ Polish | 28 597 | 0 | 2 380 | 19 604 | 12 817 | 169 498 | 583 | 233 480 | |
^ pt ^ Portuguese | 6 943 | 739 | 2 782 | 24 598 | 15 193 | 41 468 | 706 | 92 429 | | ^ pt ^ Portuguese | 7 285 | 739 | 2 782 | 24 598 | 15 193 | 229 515 | 706 | 280 818 | |
^ rn ^ Romani | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | | ^ rn ^ Romani | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | |
^ ro ^ Romanian | 4 219 | 0 | 2 738 | 8 092 | 9 446 | 34 128 | 0 | 58 622 | | ^ ro ^ Romanian | 4 219 | 0 | 2 738 | 8 092 | 9 446 | 212 396 | 0 | 236 890 | |
^ ru ^ Russian | 10 549 | 4 302 | 0 | 0 | 0 | 6 887 | 565 | 22 303 | | ^ ru ^ Russian | 12 387 | 4 302 | 0 | 0 | 0 | 104 609 | 565 | 121 864 | |
^ sk ^ Slovak | 8 596 | 0 | 0 | 18 399 | 12 727 | 5 133 | 561 | 45 416 | | ^ si ^ Sinhala | 0 | 0 | 0 | 0 | 0 | 2 346 | 0 | 2 346 | |
^ sl ^ Slovene | 4 354 | 0 | 0 | 18 515 | 12 241 | 17 035 | 0 | 52 144 | | ^ sk ^ Slovak | 8 586 | 0 | 0 | 18 399 | 12 727 | 34 581 | 561 | 74 854 | |
^ sq ^ Albanian | 0 | 0 | 0 | 0 | 0 | 2 003 | 0 | 2 003 | | ^ sl ^ Slovene | 4 636 | 0 | 0 | 18 515 | 12 241 | 83 000 | 0 | 118 392 | |
^ sr ^ Serbian | 12 356 | 0 | 0 | 0 | 0 | 20 727 | 0 | 33 082 | | ^ sq ^ Albanian | 0 | 0 | 0 | 0 | 0 | 9 351 | 0 | 9 351 | |
^ sv ^ Swedish | 17 877 | 0 | 0 | 19 542 | 13 784 | 14 666 | 638 | 66 507 | | ^ sr ^ Serbian | 12 706 | 0 | 0 | 0 | 0 | 152 636 | 0 | 165 342 | |
^ tr ^ Turkish | 0 | 0 | 0 | 0 | 0 | 21 190 | 0 | 21 190 | | ^ sv ^ Swedish | 19 740 | 0 | 0 | 19 542 | 13 784 | 81 548 | 638 | 135 252 | |
^ uk ^ Ukrainian | 12 712 | 0 | 0 | 0 | 0 | 244 | 596 | 13 551 | | ^ ta ^ Tamil | 0 | 0 | 0 | 0 | 0 | 104 | 0 | 104 | |
^ vi ^ Vietnamese | 0 | 0 | 0 | 0 | 0 | 1 474 | 0 | 1 474 | | ^ te ^ Telugu | 0 | 0 | 0 | 0 | 0 | 96 | 0 | 96 | |
^ zh ^ Chinese | 202 | 604 | 0 | 0 | 0 | 2 247 | 0 | 3 054 | | ^ th ^ Thai | 0 | 0 | 0 | 0 | 0 | 5 660 | 0 | 5 660 | |
^ **Subtotal** ^ | 361 991 | 30 552 | 24 658 | 406 445 | 263 854 | 489 143 | 11 507 | 1 588 151 | | ^ tl ^ Tagalog | 0 | 0 | 0 | 0 | 0 | 38 | 0 | 38 | |
^ cs ^ Czech | 119 933 | 4 712 | 2 310 | 19 085 | 12 908 | 50 604 | 562 | 210 114 | | ^ tr ^ Turkish | 0 | 0 | 0 | 0 | 0 | 149 892 | 0 | 149 892 | |
^ **TOTAL** ^ | 481 925 | 35 264 | 26 968 | 425 530 | 276 763 | 539 747 | 12 069 | 1 798 266 | | ^ uk ^ Ukraininan | 14 849 | 0 | 0 | 0 | 0 | 2 938 | 596 | 18 382 | |
| ^ ur ^ Urdu | 0 | 0 | 0 | 0 | 0 | 158 | 0 | 158 | |
| ^ vi ^ Vietnamese | 0 | 0 | 0 | 0 | 0 | 22 298 | 0 | 22 298 | |
| ^ zh ^ Chinese | 238 | 838 | 0 | 0 | 0 | 71 331 | 0 | 72 407 | |
| ^ **TOTAL** ^ | 511 408 | 35 503 | 26 971 | 425 670 | 276 772 | 4 001 428 | 12 069 | 5 289 821 | |
| |
N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart. | N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart. |
| |
| ====Number of texts in the Core==== |
| |
| ^ Language ^^ Number of texts ^ including originals ^ |
| ^ ar ^ Arabic | 3 | 1 | |
| ^ be ^ Belarusian | 108 | 14 | |
| ^ bg ^ Bulgarian | 87 | 19 | |
| ^ ca ^ Catalan | 92 | 1 | |
| ^ cs ^ Czech | 1 812 | 368 | |
| ^ da ^ Danish | 93 | 9 | |
| ^ de ^ German | 471 | 163 | |
| ^ en ^ English | 422 | 271 | |
| ^ es ^ Spanish | 355 | 142 | |
| ^ et ^ Estonian | 1 | 0 | |
| ^ fi ^ Finnish | 112 | 36 | |
| ^ fr ^ French | 277 | 126 | |
| ^ hi ^ Hindi | 7 | 2 | |
| ^ hr ^ Croatian | 324 | 37 | |
| ^ hs ^ Upper Sorbian | 13 | 5 | |
| ^ hu ^ Hungarian | 89 | 1 | |
| ^ it ^ Italian | 171 | 26 | |
| ^ ja ^ Japanese | 35 | 15 | |
| ^ lt ^ Lithuanian | 23 | 4 | |
| ^ lv ^ Latvian | 73 | 15 | |
| ^ mk ^ Macedonian | 108 | 4 | |
| ^ nl ^ Dutch | 215 | 52 | |
| ^ no ^ Norwegian | 102 | 23 | |
| ^ pl ^ Polish | 348 | 54 | |
| ^ pt ^ Portuguese | 87 | 24 | |
| ^ rn ^ Romani | 2 | 2 | |
| ^ ro ^ Romanian | 45 | 5 | |
| ^ ru ^ Russian | 160 | 37 | |
| ^ sk ^ Slovak | 165 | 62 | |
| ^ sl ^ Slovene | 73 | 25 | |
| ^ sr ^ Serbian | 148 | 13 | |
| ^ sv ^ Swedish | 232 | 101 | |
| ^ uk ^ Ukrainian | 199 | 8 | |
| ^ zh ^ Chinese | 3 | 3 | |
| ^ **TOTAL** ^ | 6 455 | 1 668 | |
| |
===== Morphosyntactic annotation ===== | ===== Morphosyntactic annotation ===== |
| |
| |
^ Language ^ Tags ^ Lemmas ^ Brief description ^ Detailed description ^ Tags in the corpus ^ Tool ^ | ^ Language ^ Tags ^ Lemmas ^ Brief description ^ Detailed description ^ Tags in the corpus ^ Tool ^ |
^ Belarusian | ✔ | ✔ | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://universaldependencies.org/be/index.html#morphology|in English]]%%****%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_be&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] | | ^ Belarusian | ✔ | ✔ | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://universaldependencies.org/be/index.html#morphology|in English]]%%****%%) | [[https://www.korpus.cz/kontext/wordlist/result?q=~WUgyKq0a2I2I|list]] | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] | |
^ Bulgarian | ✔ | ✔ | [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]] | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/BTB-TR03_BulTreeBank_morphosyntactic_tag.pdf|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_bg&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Bulgarian | ✔ | ✔ | [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]] | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/BTB-TR03_BulTreeBank_morphosyntactic_tag.pdf|in English]] | [[https://www.korpus.cz/kontext/wordlist/result?q=~deauEUMQSay2|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Catalan | ✔ | ✔ | [[http://clic.ub.edu/corpus/webfm_send/18|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_ca&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Catalan | ✔ | ✔ | [[http://clic.ub.edu/corpus/webfm_send/18|in English]] | | [[https://www.korpus.cz/kontext/wordlist/result?q=~xIQI46GMkQMc|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Chinese | ✔ | | [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]] | [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_zh&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]] | | ^ Chinese | ✔ | | [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]] | [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]] | [[https://www.korpus.cz/kontext/wordlist/result?q=~Qy0WEKcyKCAG|list]] | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]] | |
^ Croatian | ✔ | ✔ | [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]] | [[http://nlp.ffzg.hr/data/tagging/msd-hr.html|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_hr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | | ^ Croatian | ✔ | ✔ | [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]] | [[http://nlp.ffzg.hr/data/tagging/msd-hr.html|in English]] | [[https://www.korpus.cz/kontext/wordlist/result?q=~ve6ySioUWoQo|list]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | |
^ Czech | ✔ | ✔ | [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] | [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_cs&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] | | ^ Czech | ✔ | ✔ | [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] | [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]] | [[https://www.korpus.cz/kontext/wordlist/result?q=~dWMc6cC2mEYI|list]] | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] | |
^ Dutch | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_nl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Dutch | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]] | | [[https://www.korpus.cz/kontext/wordlist/result?q=~58AMOGUAOg6I|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ English | ✔ | ✔ | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]] | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_en&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ English | ✔ | ✔ | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]] | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]] | [[https://www.korpus.cz/kontext/wordlist/result?q=~AoIeKE4AOIoO|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Estonian | ✔ | ✔ | [[http://www.cl.ut.ee/korpused/morfliides/seletus|in Estonian and English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_et&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Estonian | ✔ | ✔ | [[http://www.cl.ut.ee/korpused/morfliides/seletus|in Estonian and English]] | | [[https://www.korpus.cz/kontext/wordlist/result?q=~OYogQQcMUc86|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Finnish | ✔ | ✔ | [[https://www.sketchengine.co.uk/finntreebank|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_fi&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] | | ^ Finnish | ✔ | ✔ | [[https://www.sketchengine.co.uk/finntreebank|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%) | [[https://www.korpus.cz/kontext/wordlist/result?q=~BwiUqc2SoaKY|list]] |[[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] | |
^ French | ✔ | ✔ | [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_fr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ French | ✔ | ✔ | [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html|in English]] | | [[https://www.korpus.cz/kontext/wordlist/result?q=~MEY8qsoECM42|list]] |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ German | ✔ | ✔ | [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]] %%**%%) | [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_de&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | | ^ German | ✔ | ✔ | [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]] %%**%%) | [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]] | [[https://www.korpus.cz/kontext/wordlist/result?q=~gs4MCm8iuEea|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | |
^ Hungarian | ✔ | | | [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_hu&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | | ^ Hungarian | ✔ | | | [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]] | [[https://www.korpus.cz/kontext/wordlist/result?q=~CCeWgGmqmcqi|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | |
^ Icelandic | ✔ | ✔ | [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]] | [[http://nlp.cs.ru.is/pdf/Tagset.pdf|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_is&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] | | ^ Icelandic | ✔ | ✔ | [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]] | [[http://nlp.cs.ru.is/pdf/Tagset.pdf|in English]] | [[https://www.korpus.cz/kontext/wordlist/result?q=~OSQqSoscsiiG|list]] | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] | |
^ Italian | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_it&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Italian | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]] | | [[https://www.korpus.cz/kontext/wordlist/result?q=~AG82UCM6swiK|list]] |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Japanese | ✔ | ✔ | [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_ja&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] | | ^ Japanese | ✔ | ✔ | [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]] | | [[https://www.korpus.cz/kontext/wordlist/result?q=~v8EQwWqiygis|list]] | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] | |
^ Latvian | ✔ | ✔ | [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_lv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] | | ^ Latvian | ✔ | ✔ | [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]] | | [[https://www.korpus.cz/kontext/wordlist/result?q=~NiGIW6iec6eq|list]] | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] | |
^ Norwegian | ✔ | ✔ | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://universaldependencies.org/no/index.html#morphology|in English]]%%****%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_no&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]] | | ^ Norwegian | ✔ | ✔ | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://universaldependencies.org/no/index.html#morphology|in English]]%%****%%) | [[https://www.korpus.cz/kontext/wordlist/result?q=~I6aemQOK8yiU|list]] | [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]] | |
^ Polish | ✔ | ✔ | [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]] | [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_pl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://sgjp.pl/morfeusz/|Morfeusz]], [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] | | ^ Polish | ✔ | ✔ | [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]] | [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]] | [[https://www.korpus.cz/kontext/wordlist/result?q=~ReKM6qg4Ic8W|list]] |[[http://sgjp.pl/morfeusz/|Morfeusz]], [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] | |
^ Portuguese | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_pt&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Portuguese | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]] | | [[https://www.korpus.cz/kontext/wordlist/result?q=~saGaiAI0uEMo|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Russian | ✔ | ✔ | [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] %%***%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_ru&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Russian | ✔ | ✔ | [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] %%***%%) | [[https://www.korpus.cz/kontext/wordlist/result?q=~T2sc4y6Uw2WO|list]] |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Slovak | ✔ | ✔ | [[http://korpus.sk/morpho.html/|in Slovak]] and [[https://korpus.sk/morpho_en.html/|English]] | [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_sk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] | | ^ Slovak | ✔ | ✔ | [[http://korpus.sk/morpho.html/|in Slovak]] and [[https://korpus.sk/morpho_en.html/|English]] | [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]] | [[https://www.korpus.cz/kontext/wordlist/result?q=~qkQQs4cq2IyG|list]] | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] | |
^ Slovene | ✔ | ✔ | | [[http://nl.ijs.si/jos/msd/html-en/josMSD-en.html|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_sl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | | ^ Slovene | ✔ | ✔ | | [[http://nl.ijs.si/jos/msd/html-en/josMSD-en.html|in English]] | [[https://www.korpus.cz/kontext/wordlist/result?q=~jQMEsa8MuCQm|list]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | |
^ Serbian | ✔ | ✔ | [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_sr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | | ^ Serbian | ✔ | ✔ | [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]] | [[https://www.korpus.cz/kontext/wordlist/result?q=~3C8YOAWM0IIC|list]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | |
^ Spanish | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_es&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Spanish | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]] | | [[https://www.korpus.cz/kontext/wordlist/result?q=~twEuIaMu4sSQ|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Swedish | ✔ | ✔ | [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_sv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] | | ^ Swedish | ✔ | ✔ | [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]] | | [[https://www.korpus.cz/kontext/wordlist/result?q=~hOAuiSoQMGQe|list]] | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] | |
^ Ukrainian | ✔ | ✔ | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_uk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] | | ^ Ukrainian | ✔ | ✔ | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://www.korpus.cz/kontext/wordlist/result?q=~iQ0owcu4o2eQ|list]] | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] | |
| |
| |
When citing a specific part of InterCorp please use the reference displayed in KonText in the corpus description, e.g. as: | When citing a specific part of InterCorp please use the reference displayed in KonText in the corpus description, e.g. as: |
| |
Rosen, A., Vavřín, M., Zasina, A. J. (2022). //The InterCorp Corpus – Czech((Insert languages actually used.)), version 15 of 11 November 2022//. Institute of the Czech National Corpus, Charles University, Prague 2022. Available on-line: https://kontext.korpus.cz/ | Rosen, A., Vavřín, M., Zasina, A. J. (2022). //The InterCorp Corpus – Czech((Insert languages actually used.)), version 16 of 11 November 2022//. Institute of the Czech National Corpus, Charles University, Prague 2022. Available on-line: https://kontext.korpus.cz/ |
| |
</WRAP> | </WRAP> |
| |
<WRAP round box 51%> | <WRAP round box 51%> |
[[en:cnk:intercorp]] • [[en:cnk:intercorp:verze13|Version 15]] • [[en:cnk:intercorp|InterCorp|Version 14]] • [[en:cnk:intercorp:verze13ud|Version 13ud]] • [[en:cnk:intercorp:verze13|Version 13]] • [[en:cnk:intercorp:verze12|Version 12]] • [[en:cnk:intercorp:verze11|Version 11]] • [[en:cnk:intercorp:verze10|Version 10]] • [[en:cnk:intercorp:verze9|Version 9]] • [[en:cnk:intercorp:verze8|Version 8]] • [[en:cnk:intercorp:verze7|Version 7]] • [[en:cnk:intercorp:verze6|Version 6]] • [[en:cnk:intercorp:verze5|Version 5]] • [[en:cnk:intercorp:verze4|Verze 4]] • [[en:cnk:intercorp:verze3|Version 3]] • [[en:cnk:intercorp:historie|Version history]] | [[en:cnk:intercorp|InterCorp]] • [[en:cnk:intercorp:verze16ud|Version 16ud]] • [[en:cnk:intercorp:verze15|Version 15]] • [[en:cnk:intercorp:verze14|Version 14]] • [[en:cnk:intercorp:verze13ud|Version 13ud]] • [[en:cnk:intercorp:verze13|Version 13]] • [[en:cnk:intercorp:verze12|Version 12]] • [[en:cnk:intercorp:verze11|Version 11]] • [[en:cnk:intercorp:verze10|Version 10]] • [[en:cnk:intercorp:verze9|Version 9]] • [[en:cnk:intercorp:verze8|Version 8]] • [[en:cnk:intercorp:verze7|Version 7]] • [[en:cnk:intercorp:verze6|Version 6]] • [[en:cnk:intercorp:verze5|Version 5]] • [[en:cnk:intercorp:verze4|Verze 4]] • [[en:cnk:intercorp:verze3|Version 3]] • [[en:cnk:intercorp:historie|Version history]] |
| |
See [[https://intercorp.korpus.cz/?lang=en|the original InterCorp site in English]]. | See [[https://intercorp.korpus.cz/?lang=en|the original InterCorp site in English]]. |
</WRAP> | </WRAP> |
| |