Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:cnk:intercorp:verze10 [2017/09/14 22:14] – Lithuanian deleted alexandrrosen | en:cnk:intercorp:verze10 [2019/10/06 20:43] (current) – [Taggers/lemmatizers:] michalskrabal |
---|
====== InterCorp Release 10 ====== | ====== InterCorp Release 10 ====== |
| |
| |
| |
<WRAP right> | |
^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ | ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ |
^ Positions ^ Number of tokens | 127,413,531 | 118,069,703 | 311,809,130 | 1,551,411,225 | | ^ Positions ^ Number of tokens | 127,413,531 | 118,069,703 | 311,809,130 | 1,551,411,225 | |
^ ::: ^ tagged languages | 23 ^^^^ | ^ ::: ^ tagged languages | 23 ^^^^ |
^ ::: ^ lemmatized languages | 22 ^^^^ | ^ ::: ^ lemmatized languages | 22 ^^^^ |
</WRAP> | |
| |
| |
===== Access to the texts ===== | ===== Access to the texts ===== |
After [[http://korpus.cz/english/prohlaseni-aj.php|registration]] the corpus can be searched using a web interface. The registration is valid for all ICNC corpora with public access. If you already have a user name and password for the Czech part of the Czech National Corpus, you do not need to register for the parallel corpus. | After [[http://korpus.cz/english/prohlaseni-aj.php|registration]] the corpus can be searched using a web interface. The registration is valid for all ICNC corpora with public access. If you already have a user name and password for the Czech part of the Czech National Corpus, you do not need to register for the parallel corpus. |
| |
InterCorp can be accessed via a standard web browser from the integrated search interface of the Czech National Corpus [[http://kontext.korpus.cz/|KonText]]. A tutorial in Czech is available [[kurz:uvod|here]]. | InterCorp can be accessed via a standard web browser from the integrated search interface of the Czech National Corpus [[http://kontext.korpus.cz/|KonText]]. A tutorial is available [[kurz:uvod|in Czech]] and [[en:kurz:hledani_v_paralelnim_korpusu|a brief summary also in English]]. |
| |
After signing a non-profit licence agreement, texts from InterCorp can also be acquired as bilingual files including shuffled pairs of sentences. Please contact us at the address below if you are interested. | After signing a non-profit licence agreement, texts from InterCorp can also be acquired as bilingual files including shuffled pairs of sentences. Please contact us at the address below if you are interested. |
| |
New release of InterCorp is usually published once per year. With each new release, its size, possibly also the number of languages and the extent and quality of annotation may grow. Previous versions remain available (starting with release 6). | New release of InterCorp is usually published once per year. With each new release, its size, possibly also the number of languages and the extent and quality of annotation may grow. Previous versions remain available (starting with release 6). |
| |
| |
===== References ===== | ===== References ===== |
^ Catalan | ✔ | ✔ | [[http://clic.ub.edu/corpus/webfm_send/18|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Catalan | ✔ | ✔ | [[http://clic.ub.edu/corpus/webfm_send/18|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Croatian | ✔ | ✔ | [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]] | | [[https://github.com/uzh/reldi|ReLDI Tagger]] | | ^ Croatian | ✔ | ✔ | [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]] | | [[https://github.com/uzh/reldi|ReLDI Tagger]] | |
^ Czech | ✔ | ✔ | [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|in English]] | [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]] | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] | | ^ Czech | ✔ | ✔ | [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] | [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]] | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] | |
^ Dutch | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]] | [[http://www.inl.nl/tst-centrale/images/stories/producten/documentatie/ehc_handleiding_nl.pdf|in Dutch]] | [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] | | ^ Dutch | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]] | [[http://www.inl.nl/tst-centrale/images/stories/producten/documentatie/ehc_handleiding_nl.pdf|in Dutch]] | [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] | |
^ English | ✔ | ✔ | [[https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html|in English]] | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]] | [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] | | ^ English | ✔ | ✔ | [[https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html|in English]] | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]] | [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] | |
^ Estonian | ✔ | ✔ | [[http://www.cl.ut.ee/korpused/morfliides/seletus| in Estonian and English]] | | [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] | | ^ Estonian | ✔ | ✔ | [[http://www.cl.ut.ee/korpused/morfliides/seletus| in Estonian and English]] | | [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] | |
^ Finnish | ✔ | ✔ | | [[http://home.gna.org/omorfi/omorfi/omorfi_user.html|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] | | ^ Finnish | ✔ | ✔ | [[https://www.sketchengine.co.uk/finntreebank/|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] | |
^ French | ✔ | ✔ | [[http://www.ims.uni-stuttgart.de/%7Eschmid/french-tagset.html|in English]] | | [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] | | ^ French | ✔ | ✔ | [[http://www.ims.uni-stuttgart.de/%7Eschmid/french-tagset.html|in English]] | | [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]] | |
^ German | ✔ | ✔ | [[http://www.sketchengine.co.uk/documentation/wiki/tagsets/german_rftagger|in English]]%%**%% | [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | | ^ German | ✔ | ✔ | [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]]%%**%% | [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | |
^ Hungarian | ✔ | | [[http://nl.ijs.si/ME/Vault/V3/msd/html/msd.html#SECTION05400000000000000000|in English]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | | ^ Hungarian | ✔ | | [[http://nl.ijs.si/ME/Vault/V3/msd/html/msd.html#SECTION05400000000000000000|in English]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | |
^ Icelandic | ✔ | ✔ | [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]] | | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] | | ^ Icelandic | ✔ | ✔ | [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]] | | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] | |
^ Slovene | ✔ | ✔ | [[http://nl.ijs.si/ME/V4/msd/html/msd.msds-sl.html|in English and Slovene]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-sl.introduction.html|in English]] | [[http://nl2.ijs.si/analyze/|ToTaLe]] | | ^ Slovene | ✔ | ✔ | [[http://nl.ijs.si/ME/V4/msd/html/msd.msds-sl.html|in English and Slovene]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-sl.introduction.html|in English]] | [[http://nl2.ijs.si/analyze/|ToTaLe]] | |
^ Serbian | ✔ | ✔ | [[http://nl.ijs.si/ME/V4/msd/html/msd.msds-sr.html|in English]] | | [[https://github.com/uzh/reldi|ReLDI Tagger]] | | ^ Serbian | ✔ | ✔ | [[http://nl.ijs.si/ME/V4/msd/html/msd.msds-sr.html|in English]] | | [[https://github.com/uzh/reldi|ReLDI Tagger]] | |
^ Spanish | ✔ | ✔ | [[ftp://ftp.ims.uni-stuttgart.de/corpora/spanish-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Spanish | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Swedish | ✔ | ✔ | [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]] | | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] | | ^ Swedish | ✔ | ✔ | [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]] | | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] | |
| |
* [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger and IceStagger]] for Swedish and Icelandic (thanks to Robert Östling) | * [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger and IceStagger]] for Swedish and Icelandic (thanks to Robert Östling) |
* [[https://github.com/uzh/reldi/tree/master/tools/tagger|RelDI tagger]] for Croatian and Serbian (thanks to [[http://nlp.ffzg.hr/people/nikola-ljubesic/|Nikola Ljubešić]]) | * [[https://github.com/uzh/reldi/tree/master/tools/tagger|RelDI tagger]] for Croatian and Serbian (thanks to [[http://nlp.ffzg.hr/people/nikola-ljubesic/|Nikola Ljubešić]]) |
* [[https://peteris.rocks/blog/latvian-part-of-speech-tagging/|LVTagger]] for Latvian (thanks to Peteris Rocks and Michal Škrabal) | * [[https://peteris.rocks/blog/latvian-part-of-speech-tagging/|LVTagger]] for Latvian (thanks to Pēteris Paikens and Michal Škrabal) |
| |
| |