Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:cnk:intercorp:verze14 [2022/01/14 15:34] – [Structural attributes] alexandrrosen | en:cnk:intercorp:verze14 [2024/04/18 16:00] (current) – [Morphosyntactic annotation] michalkren |
---|
====== InterCorp Release 14 ====== | ====== InterCorp Release 14 ====== |
| |
numbers: TODO! | |
| |
^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ | ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ |
^ Positions ^ Number of tokens | 141,032,521 | 116,673,043 | 394,042,551 | 1,550,071,364 | | ^ Positions ^ Number of tokens | 145,640,866 | 116,673,038 | 418,967,492 | 1,548,425,287 | |
^ ::: ^ Number of word forms | 113,838,505 | 89,819,773 | 327,968,369 | 1,223,270,610 | | ^ ::: ^ Number of word forms | 117,606,467 | 89,819,772 | 348,771,933 | 1,223,221,264 | |
^ Structural attributes ^ Number of documents | 1,657 | 30 | 3,993 | 282 | | ^ Structural attributes ^ Number of documents | 1,708 | 30 | 4,220 | 282 | |
^ ::: ^ Number of texts | 1,657 | 111,951 | 3,993 | 1,843,528 | | ^ ::: ^ Number of texts | 1,708 | 111,951 | 4,220 | 1,843,528 | |
^ ::: ^ Number of sentences | 9,782,001 | 13,606,183 | 24,305,621 | 143,195,566 | | ^ ::: ^ Number of sentences | 10,095,074 | 136,606,183 | 25,872,393 | 143,195,566 | |
^ Further information ^ reference | YES ^^^^ | ^ Further information ^ reference | YES ^^^^ |
^ ::: ^ representative | NO ^^^^ | ^ ::: ^ representative | NO ^^^^ |
^ hi ^ Hindi | 409 | 0 | 0 | 0 | 0 | 0 | 0 | 409 | | ^ hi ^ Hindi | 409 | 0 | 0 | 0 | 0 | 0 | 0 | 409 | |
^ hr ^ Croatian | 22 736 | 0 | 0 | 0 | 0 | 19 048 | 571 | 42 356 | | ^ hr ^ Croatian | 22 736 | 0 | 0 | 0 | 0 | 19 048 | 571 | 42 356 | |
^ hu ^ Hungarian | 110 | 0 | 0 | 0 | 0 | 0 | 0 | 110 | | ^ hs ^ Upper Sorbian | 110 | 0 | 0 | 0 | 0 | 0 | 0 | 110 | |
^ hs ^ Upper Sorbian | 6 444 | 0 | 0 | 17 852 | 12 198 | 21 115 | 0 | 57 609 | | ^ hu ^ Hungarian | 6 444 | 0 | 0 | 17 852 | 12 198 | 21 115 | 0 | 57 609 | |
^ is ^ Icelandic| 0 | 0 | 0 | 0 | 0 | 1 581 | 0 | 1 581 | | ^ is ^ Icelandic| 0 | 0 | 0 | 0 | 0 | 1 581 | 0 | 1 581 | |
^ it ^ Italian | 15 741 | 1 252 | 2 747 | 23 771 | 15 494 | 14 700 | 684 | 74 389 | | ^ it ^ Italian | 15 741 | 1 252 | 2 747 | 23 771 | 15 494 | 14 700 | 684 | 74 389 | |
Texts in the following languages have received some morphosyntactic annotation. The format and often even the meaning of categories encoded in the morphosyntactic tags differs in most languages. Thus for each tagged language we provide a link to the tagset description. After selecting CQL as the query type, the tagset description is available also from the KonText search interface. | Texts in the following languages have received some morphosyntactic annotation. The format and often even the meaning of categories encoded in the morphosyntactic tags differs in most languages. Thus for each tagged language we provide a link to the tagset description. After selecting CQL as the query type, the tagset description is available also from the KonText search interface. |
| |
^ Language ^ Tags ^ Lemmas ^ Brief description ^ Detailed description ^ Tags in the corpus ^ Tool ^ | ^ Language ^ Tags ^ Lemmas ^ Brief description ^ Detailed description ^ Tool ^ |
^ Belarusian | ✔ | ✔ | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://universaldependencies.org/be/index.html#morphology|in English]]%%****%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_be&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] | | ^ Belarusian | ✔ | ✔ | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://universaldependencies.org/be/index.html#morphology|in English]]%%****%%) | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] | |
^ Bulgarian | ✔ | ✔ | [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]] | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/BTB-TR03_BulTreeBank_morphosyntactic_tag.pdf|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_bg&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Bulgarian | ✔ | ✔ | [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]] | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/BTB-TR03_BulTreeBank_morphosyntactic_tag.pdf|in English]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Catalan | ✔ | ✔ | [[http://clic.ub.edu/corpus/webfm_send/18|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_ca&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Catalan | ✔ | ✔ | [[http://clic.ub.edu/corpus/webfm_send/18|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Chinese | ✔ | | [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]] | [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_zh&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]] | | ^ Chinese | ✔ | | [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]] | [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]] | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]] | |
^ Croatian | ✔ | ✔ | [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]] | [[http://nlp.ffzg.hr/data/tagging/msd-hr.html|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_hr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | | ^ Croatian | ✔ | ✔ | [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]] | [[http://nlp.ffzg.hr/data/tagging/msd-hr.html|in English]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | |
^ Czech | ✔ | ✔ | [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] | [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_cs&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] | | ^ Czech | ✔ | ✔ | [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] | [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]] | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] | |
^ Dutch | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_nl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Dutch | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ English | ✔ | ✔ | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]] | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_en&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ English | ✔ | ✔ | [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]] | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Estonian | ✔ | ✔ | [[http://www.cl.ut.ee/korpused/morfliides/seletus|in Estonian and English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_et&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Estonian | ✔ | ✔ | [[http://www.cl.ut.ee/korpused/morfliides/seletus|in Estonian and English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Finnish | ✔ | ✔ | [[https://www.sketchengine.co.uk/finntreebank|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_fi&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] | | ^ Finnish | ✔ | ✔ | [[https://www.sketchengine.co.uk/finntreebank|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%) | [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] | |
^ French | ✔ | ✔ | [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_fr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ French | ✔ | ✔ | [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html|in English]] | |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ German | ✔ | ✔ | [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]] %%**%%) | [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_de&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | | ^ German | ✔ | ✔ | [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]] %%**%%) | [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | |
^ Hungarian | ✔ | | | [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_hu&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | | ^ Hungarian | ✔ | | | [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] | |
^ Icelandic | ✔ | ✔ | [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]] | [[http://nlp.cs.ru.is/pdf/Tagset.pdf|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_is&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] | | ^ Icelandic | ✔ | ✔ | [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]] | [[http://nlp.cs.ru.is/pdf/Tagset.pdf|in English]] | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] | |
^ Italian | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_it&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Italian | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]] | |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Japanese | ✔ | ✔ | [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_ja&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] | | ^ Japanese | ✔ | ✔ | [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]] | | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] | |
^ Latvian | ✔ | ✔ | [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_lv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] | | ^ Latvian | ✔ | ✔ | [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]] | | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] | |
^ Norwegian | ✔ | ✔ | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_no&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://github.com/noklesta/The-Oslo-Bergen-Tagger|Oslo-Bergen Tagger]] | | ^ Norwegian | ✔ | ✔ | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]] | | [[https://github.com/noklesta/The-Oslo-Bergen-Tagger|Oslo-Bergen Tagger]] | |
^ Polish | ✔ | ✔ | [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]] | [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_pl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://sgjp.pl/morfeusz/|Morfeusz]], [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] | | ^ Polish | ✔ | ✔ | [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]] | [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]] |[[http://sgjp.pl/morfeusz/|Morfeusz]], [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] | |
^ Portuguese | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_pt&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Portuguese | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Russian | ✔ | ✔ | [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] %%***%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_ru&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Russian | ✔ | ✔ | [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] %%***%%) |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Slovak | ✔ | ✔ | [[http://korpus.sk/morpho.html/|in Slovak]] and [[https://korpus.sk/morpho_en.html/|English]] | [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] | | ^ Slovak | ✔ | ✔ | [[http://korpus.sk/morpho.html/|in Slovak]] and [[https://korpus.sk/morpho_en.html/|English]] | [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]] | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] | |
^ Slovene | ✔ | ✔ | | [[http://nl.ijs.si/jos/msd/html-en/josMSD-en.html|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | | ^ Slovene | ✔ | ✔ | | [[http://nl.ijs.si/jos/msd/html-en/josMSD-en.html|in English]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | |
^ Serbian | ✔ | ✔ | [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]] | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | | ^ Serbian | ✔ | ✔ | [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]] | [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]] | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] | |
^ Spanish | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_es&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | | ^ Spanish | ✔ | ✔ | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]] | | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] | |
^ Swedish | ✔ | ✔ | [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]] | | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] | | ^ Swedish | ✔ | ✔ | [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]] | | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] | |
^ Ukrainian | ✔ | ✔ | | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_uk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]] | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] | | ^ Ukrainian | ✔ | ✔ | | [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%) | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] | |
| |
| |
When citing a specific part of InterCorp please use the reference displayed in KonText in the corpus description, e.g. as: | When citing a specific part of InterCorp please use the reference displayed in KonText in the corpus description, e.g. as: |
| |
Rosen, A., Vavřín, M., Zasina, A. J. (2022). //The InterCorp Corpus – Czech((Insert languages actually used.)), version 14 of 17 January 2022//. Institute of the Czech National Corpus, Charles University, Prague 2020. Available on-line: https://kontext.korpus.cz/ | Rosen, A., Vavřín, M., Zasina, A. J. (2022). //The InterCorp Corpus – Czech((Insert languages actually used.)), version 14 of 31 January 2022//. Institute of the Czech National Corpus, Charles University, Prague 2022. Available on-line: https://kontext.korpus.cz/ |
| |
</WRAP> | </WRAP> |