Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:cnk:intercorp:verze13 [2020/11/02 19:50] – [Morphosyntactic annotation] alexandrrosen | en:cnk:intercorp:verze13 [2024/04/18 12:54] (current) – [Morphosyntactic annotation] michalkren |
---|
These texts have been aligned automatically: search results may include a higher number of misaligned segments. Morevore, the collections do not retain all texts from the original resource. This includes texts that have no Czech counterpart. Some texts from the //Acquis Communautaire// and //Europarl// corpora have been partially corrected or omitted – as a result, they may differ in form or size if compared with the original source. A similar selection was applied to the //Open Subtitles// database, where – as an additional reduction – only a single translation was selected per title and language. On the other hand, some metadata items missing in the original resource but detectable from context or other sources have been added. | These texts have been aligned automatically: search results may include a higher number of misaligned segments. Morevore, the collections do not retain all texts from the original resource. This includes texts that have no Czech counterpart. Some texts from the //Acquis Communautaire// and //Europarl// corpora have been partially corrected or omitted – as a result, they may differ in form or size if compared with the original source. A similar selection was applied to the //Open Subtitles// database, where – as an additional reduction – only a single translation was selected per title and language. On the other hand, some metadata items missing in the original resource but detectable from context or other sources have been added. |
| |
Each text has a Czech counterpart. As a result, Czech is the pivot language: for every text there is a single Czech version (original or translation), aligned with one or more foreign-language versions. The total size of the available part of InterCorp in release 13 from December 2020 is 328 mil. words in the aligned foreign language texts in the core part and 1,223 mil. words in the collections. The number of words in the Czech texts is 114 mil. in the core part and 90 mil. in the collections (see [[en:cnk:intercorp:historie|Version history]]). The share of the core and the collections in the corpus can be seen in the following charts. The charts show the volumes in millions of words. | Each text has a Czech counterpart. As a result, Czech is the pivot language: for every text there is a single Czech version (original or translation), aligned with one or more foreign-language versions. The total size of the available part of InterCorp in release 13 published in November 2020 is 328 mil. words in the aligned foreign language texts in the core part and 1,223 mil. words in the collections. The number of words in the Czech texts is 114 mil. in the core part and 90 mil. in the collections (see [[en:cnk:intercorp:historie|Version history]]). The share of the core and the collections in the corpus can be seen in the following charts. The charts show the volumes in millions of words. |
| |
| |
[{{:cnk:intercorp:intercorp_wordcounts_v13.png|Setup of the parallel corpus – the core and collections}}] | [{{:cnk:intercorp:intercorp_wordcounts_v13.png|Setup of the parallel corpus – the core and collections}}] \\ |
| |
[{{:cnk:intercorp:intercorp_wordcounts2_v13.png|Setup of the parallel corpus – the core}}] | [{{:cnk:intercorp:intercorp_wordcounts2_v13.png|Setup of the parallel corpus – the core}}] \\ |
| |
[{{:cnk:intercorp:intercorp_wordcounts3_v13.png|Setup of the parallel corpus – collections}}] | [{{:cnk:intercorp:intercorp_wordcounts3_v13.png|Setup of the parallel corpus – collections}}] |
Texts in the following languages have received some morphosyntactic annotation. The format and often even the meaning of categories encoded in the morphosyntactic tags differs in most languages. Thus for each tagged language we provide a link to the tagset description. After selecting CQL as the query type, the tagset description is available also from the KonText search interface. | Texts in the following languages have received some morphosyntactic annotation. The format and often even the meaning of categories encoded in the morphosyntactic tags differs in most languages. Thus for each tagged language we provide a link to the tagset description. After selecting CQL as the query type, the tagset description is available also from the KonText search interface. |
| |
^ Language ^ Tags ^ Lemmas ^ Brief description ^ Detailed description ^ Tags in the corpus ^ Tool ^ | ^ Language ^ Tags ^ Lemmas ^ Brief description ^ Detailed description ^ Tool ^ |
^ Belarusian | ✔ | ✔ | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://universaldependencies.org/be/index.html#morphology|in English]]%%****%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_be&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] | | ^ Belarusian | ✔ | ✔ | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://universaldependencies.org/be/index.html#morphology|in English]]%%****%%) | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] | |
^ Bulgarian | ✔ | ✔ | [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]] | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/BTB-TR03_BulTreeBank_morphosyntactic_tag.pdf|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_bg&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Bulgarian | ✔ | ✔ | [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]] | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/BTB-TR03_BulTreeBank_morphosyntactic_tag.pdf|in English]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Catalan | ✔ | ✔ | [[http://clic.ub.edu/corpus/webfm_send/18|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_ca&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Catalan | ✔ | ✔ | [[http://clic.ub.edu/corpus/webfm_send/18|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Chinese | ✔ | | [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]] | [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_zh&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]] | | ^ Chinese | ✔ | | [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]] | [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]] | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]] | |
^ Croatian | ✔ | ✔ | [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]] | [[http://nlp.ffzg.hr/data/tagging/msd-hr.html|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_hr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | | ^ Croatian | ✔ | ✔ | [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]] | [[http://nlp.ffzg.hr/data/tagging/msd-hr.html|in English]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | |
^ Czech | ✔ | ✔ | [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] | [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_cs&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] | | ^ Czech | ✔ | ✔ | [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] | [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]] | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] | |
^ Dutch | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_nl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Dutch | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ English | ✔ | ✔ | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]] | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_en&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ English | ✔ | ✔ | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]] | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Estonian | ✔ | ✔ | [[http://www.cl.ut.ee/korpused/morfliides/seletus|in Estonian and English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_et&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Estonian | ✔ | ✔ | [[http://www.cl.ut.ee/korpused/morfliides/seletus|in Estonian and English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Finnish | ✔ | ✔ | [[https://www.sketchengine.co.uk/finntreebank|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_fi&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] | | ^ Finnish | ✔ | ✔ | [[https://www.sketchengine.co.uk/finntreebank|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%) |[[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] | |
^ French | ✔ | ✔ | [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_fr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ French | ✔ | ✔ | [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html|in English]] | |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ German | ✔ | ✔ | [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]] %%**%%) | [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_de&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | | ^ German | ✔ | ✔ | [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]] %%**%%) | [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | |
^ Hungarian | ✔ | | | [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_hu&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | | ^ Hungarian | ✔ | | | [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | |
^ Icelandic | ✔ | ✔ | [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]] | [[http://nlp.cs.ru.is/pdf/Tagset.pdf|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_is&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] | | ^ Icelandic | ✔ | ✔ | [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]] | [[http://nlp.cs.ru.is/pdf/Tagset.pdf|in English]] | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] | |
^ Italian | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_it&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Italian | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]] | |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Japanese | ✔ | ✔ | [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_ja&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] | | ^ Japanese | ✔ | ✔ | [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]] | | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] | |
^ Latvian | ✔ | ✔ | [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_lv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] | | ^ Latvian | ✔ | ✔ | [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]] | | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] | |
^ Norwegian | ✔ | ✔ | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_no&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://github.com/noklesta/The-Oslo-Bergen-Tagger|Oslo-Bergen Tagger]] | | ^ Norwegian | ✔ | ✔ | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]] | | [[https://github.com/noklesta/The-Oslo-Bergen-Tagger|Oslo-Bergen Tagger]] | |
^ Polish | ✔ | ✔ | [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]] | [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_pl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://sgjp.pl/morfeusz/|Morfeusz]], [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] | | ^ Polish | ✔ | ✔ | [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]] | [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]] |[[http://sgjp.pl/morfeusz/|Morfeusz]], [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] | |
^ Portuguese | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_pt&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Portuguese | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Russian | ✔ | ✔ | [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] %%***%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_ru&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Russian | ✔ | ✔ | [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] %%***%%) |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Slovak | ✔ | ✔ | [[http://korpus.sk/morpho.html/|in Slovak]] and [[https://korpus.sk/morpho_en.html/|English]] | [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] | | ^ Slovak | ✔ | ✔ | [[http://korpus.sk/morpho.html/|in Slovak]] and [[https://korpus.sk/morpho_en.html/|English]] | [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]] | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] | |
^ Slovene | ✔ | ✔ | [[https://www.sketchengine.eu/slovene-tagset-multext-east-v3/|in English]] | [[http://nl.ijs.si/jos/msd/html-en/josMSD-en.html|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | | ^ Slovene | ✔ | ✔ | | [[http://nl.ijs.si/jos/msd/html-en/josMSD-en.html|in English]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | |
^ Serbian | ✔ | ✔ | [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | | ^ Serbian | ✔ | ✔ | [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | |
^ Spanish | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_es&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Spanish | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Swedish | ✔ | ✔ | [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] | | ^ Swedish | ✔ | ✔ | [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]] | | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] | |
^ Ukrainian | ✔ | ✔ | | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_uk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] | | ^ Ukrainian | ✔ | ✔ | | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] | |
| |
| |