Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision |
en:cnk:intercorp:verze11 [2019/12/20 00:10] – [Morphosyntactic annotation] alexandrrosen | en:cnk:intercorp:verze11 [2019/12/20 00:22] – [InterCorp Release 12] alexandrrosen |
---|
~~NOTOC~~ | ~~NOTOC~~ |
====== InterCorp Release 11 ====== | ====== InterCorp Release 12 ====== |
| |
^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ | ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ |
^ Positions ^ Number of tokens | 132,508,429 | 115,574,528 | 340,554,768 | 1,550,923,096 | | ^ Positions ^ Number of tokens | 137 059 021 | 116 673 027 | 373 873 819 | 1 549 570 665 | |
^ ::: ^ Number of word forms | 106,898,538 | 88,872,779 | 283,075,338 | 1,225,361,750 | | ^ ::: ^ Number of word forms | 110 588 784 | 89 819 765 | 310 914 295 | 1 222 868 666 | |
^ Structural attributes ^ Number of documents | 1,564 | 28 | 3,494 | 261 | | ^ Structural attributes ^ Number of documents | 1 619 | 30 | 3 806 | 281 | |
^ ::: ^ Number of texts | 1,507 | 111,672 | 3,232 | 1,841,341 | | ^ ::: ^ Number of texts | 1 619 | 111 951 | 3 806 | 1 843 489 | |
^ ::: ^ Number of sentences | 9,193,433 | 13,556,382 | 21,000,997 | 142,734,659 | | ^ ::: ^ Number of sentences | 9 518 229 | 13 606 183 | 23 076 128 | 143 165 959 | |
^ Further information ^ reference | YES ^^^^ | ^ Further information ^ reference | YES ^^^^ |
^ ::: ^ representative | NO ^^^^ | ^ ::: ^ representative | NO ^^^^ |
^ ::: ^ publication date | 2018 ^^^^ | ^ ::: ^ publication date | 2019 ^^^^ |
^ ::: ^ foreign languages | 39 ^^^^ | ^ ::: ^ foreign languages | 40 ^^^^ |
^ ::: ^ tagged languages | 26 ^^^^ | ^ ::: ^ tagged languages | 26 ^^^^ |
^ ::: ^ lemmatized languages | 25 ^^^^ | ^ ::: ^ lemmatized languages | 25 ^^^^ |
^ English | ✔ | ✔ | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]] | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]] | [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] | | ^ English | ✔ | ✔ | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]] | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]] | [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] | |
^ Estonian | ✔ | ✔ | [[http://www.cl.ut.ee/korpused/morfliides/seletus| in Estonian and English]] | | [[http://http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Estonian | ✔ | ✔ | [[http://www.cl.ut.ee/korpused/morfliides/seletus| in Estonian and English]] | | [[http://http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Finnish | ✔ | ✔ | [[https://www.sketchengine.co.uk/finntreebank/|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] | | ^ Finnish | ✔ | ✔ | [[https://www.sketchengine.co.uk/finntreebank/|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] + [[https://code.google.com/archive/p/hunpos/|HunPOS]] | |
^ French | ✔ | ✔ | [[http://www.ims.uni-stuttgart.de/%7Eschmid/french-tagset.html|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ French | ✔ | ✔ | [[http://www.ims.uni-stuttgart.de/%7Eschmid/french-tagset.html|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ German | ✔ | ✔ | [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]]%%**%% | [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | | ^ German | ✔ | ✔ | [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]]%%**%% | [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | |
^ Icelandic | ✔ | ✔ | [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]] | | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] | | ^ Icelandic | ✔ | ✔ | [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]] | | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] | |
^ Italian | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Italian | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Japanese | ✔ | ✔ | [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]] | | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] | | ^ Japanese | ✔ | ✔ | [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]] | | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] | |
^ Latvian | ✔ | ✔ | [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]] | | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] | | ^ Latvian | ✔ | ✔ | [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]] | | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] | |
^ Norwegian | ✔ | ✔ | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]] | | [[https://visl.sdu.dk/remoting.html|VISL]] | | ^ Norwegian | ✔ | ✔ | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]] | | [[https://visl.sdu.dk/remoting.html|VISL]] | |
* [[http://ufal.mff.cuni.cz/morfflex|MorfFlex]], [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] and [[https://is.cuni.cz/webapps/zzp/download/140018093/?back_id=10|LanGr]] for Czech | * [[http://ufal.mff.cuni.cz/morfflex|MorfFlex]], [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] and [[https://is.cuni.cz/webapps/zzp/download/140018093/?back_id=10|LanGr]] for Czech |
* [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] for Bulgarian, Dutch, English, Estonian (thanks to Helmut Schmid), French, Italian, Portuguese (thanks to Pablo Gamallo), Russian and Spanish | * [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] for Bulgarian, Dutch, English, Estonian (thanks to Helmut Schmid), French, Italian, Portuguese (thanks to Pablo Gamallo), Russian and Spanish |
* [[http://sgjp.pl/morfeusz/|Morfeusz]] and [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]] for Polish | * [[http://sgjp.pl/morfeusz/|Morfeusz]] and [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] for Polish |
* [[http://code.google.com/p/hunpos/|HunPOS]] for Hungarian and other languages | * [[http://code.google.com/p/hunpos/|HunPOS]] for Hungarian and other languages |
* [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík) | * [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík) |
* [[https://peteris.rocks/blog/latvian-part-of-speech-tagging/|LVTagger]] for Latvian (thanks to Pēteris Paikens and Michal Škrabal) | * [[https://peteris.rocks/blog/latvian-part-of-speech-tagging/|LVTagger]] for Latvian (thanks to Pēteris Paikens and Michal Škrabal) |
* [[http://ufal.mff.cuni.cz/udpipe|UD Pipe]] for Belarusian and Ukrainian (thanks to Bohdan Moskalevskyi) | * [[http://ufal.mff.cuni.cz/udpipe|UD Pipe]] for Belarusian and Ukrainian (thanks to Bohdan Moskalevskyi) |
* [[https://taku910.github.io/mecab/|MeCab]] and [[https://osdn.net/projects/unidic/|Unidic]] for Japanese | * [[https://taku910.github.io/mecab/|MeCab]] and [[https://osdn.net/projects/unidic/|Unidic]] for Japanese (thanks to Adam Nohejl) |
| * [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar]] for Chinese (thanks to Vlastimil Dobečka) |
| |
| |
| |
<WRAP round box 51%> | <WRAP round box 51%> |
[[en:cnk:intercorp|InterCorp]] • [[en:cnk:intercorp:verze10|Version 10]] • [[en:cnk:intercorp:verze9|Version 9]] • [[en:cnk:intercorp:verze8|Version 8]] • [[en:cnk:intercorp:verze7|Version 7]] • [[en:cnk:intercorp:verze6|Version 6]] • [[en:cnk:intercorp:verze5|Version 5]] • [[en:cnk:intercorp:verze4|Verze 4]] • [[en:cnk:intercorp:verze3|Version 3]] • [[en:cnk:intercorp:historie|Version history]] | [[en:cnk:intercorp|InterCorp]] • [[en:cnk:intercorp:verze11|Version 11]] • [[en:cnk:intercorp:verze10|Version 10]] • [[en:cnk:intercorp:verze9|Version 9]] • [[en:cnk:intercorp:verze8|Version 8]] • [[en:cnk:intercorp:verze7|Version 7]] • [[en:cnk:intercorp:verze6|Version 6]] • [[en:cnk:intercorp:verze5|Version 5]] • [[en:cnk:intercorp:verze4|Verze 4]] • [[en:cnk:intercorp:verze3|Version 3]] • [[en:cnk:intercorp:historie|Version history]] |
| |
See [[http://ucnk.ff.cuni.cz/intercorp/?lang=en|the original InterCorp site in English]]. | See [[https://intercorp.korpus.cz/?lang=en|the original InterCorp site in English]]. |
</WRAP> | </WRAP> |
| |