AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:intercorp:verze14 [2022/01/14 15:34] – [Structural attributes] Alexandr Rosenen:cnk:intercorp:verze14 [2022/04/01 15:50] (current) – [Corpus size in thousands of words] Michal Škrabal
Line 1: Line 1:
 ====== InterCorp Release 14 ====== ====== InterCorp Release 14 ======
- 
-numbers: TODO! 
  
 ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^
-^ Positions ^ Number of tokens |  141,032,521 |  116,673,043 |  394,042,551 |  1,550,071,364 +^ Positions ^ Number of tokens |  145,640,866 |  116,673,038 |  418,967,492 |  1,548,425,287 
-^ ::: ^ Number of word forms |  113,838,505 |  89,819,773 |   327,968,369 |  1,223,270,610 +^ ::: ^ Number of word forms |  117,606,467 |  89,819,772 |   348,771,933 |  1,223,221,264 
-^ Structural attributes ^ Number of documents |  1,657 |  30 |  3,993 |   282 | +^ Structural attributes ^ Number of documents |  1,708 |  30 |  4,220 |   282 | 
-^ ::: ^ Number of texts |  1,657 |  111,951 |  3,993 |  1,843,528 | +^ ::: ^ Number of texts |  1,708 |  111,951 |  4,220 |  1,843,528 | 
-^ ::: ^ Number of sentences |  9,782,001 |  13,606,183 |  24,305,621 |  143,195,566 |+^ ::: ^ Number of sentences |  10,095,074 |  136,606,183 |  25,872,393 |  143,195,566 |
 ^ Further information ^ reference |  YES   ^^^^ ^ Further information ^ reference |  YES   ^^^^
 ^ ::: ^ representative |  NO  ^^^^ ^ ::: ^ representative |  NO  ^^^^
Line 65: Line 63:
 ^  hi  ^ Hindi |  409 |  0 |  0 |  0 |  0 |  0 |  0 |  409 | ^  hi  ^ Hindi |  409 |  0 |  0 |  0 |  0 |  0 |  0 |  409 |
 ^  hr  ^ Croatian |  22 736 |  0 |  0 |  0 |  0 |  19 048 |  571 |  42 356 | ^  hr  ^ Croatian |  22 736 |  0 |  0 |  0 |  0 |  19 048 |  571 |  42 356 |
-^  hu  Hungarian |  110 |  0 |  0 |  0 |  0 |  0 |  0 |  110 | +^  hs  Upper Sorbian |  110 |  0 |  0 |  0 |  0 |  0 |  0 |  110 | 
-^  hs  Upper Sorbian |  6 444 |  0 |  0 |  17 852 |  12 198 |  21 115 |  0 |  57 609 |+^  hu  Hungarian |  6 444 |  0 |  0 |  17 852 |  12 198 |  21 115 |  0 |  57 609 |
 ^  is  ^ Icelandic|  0 |  0 |  0 |  0 |  0 |  1 581 |  0 |  1 581 | ^  is  ^ Icelandic|  0 |  0 |  0 |  0 |  0 |  1 581 |  0 |  1 581 |
 ^  it  ^ Italian |  15 741 |  1 252 |  2 747 |  23 771 |  15 494 |  14 700 |  684 |  74 389 | ^  it  ^ Italian |  15 741 |  1 252 |  2 747 |  23 771 |  15 494 |  14 700 |  684 |  74 389 |
Line 101: Line 99:
  
 ^  Language  ^  Tags  ^  Lemmas  ^  Brief description  ^  Detailed description  ^ Tags in the corpus ^ Tool  ^ ^  Language  ^  Tags  ^  Lemmas  ^  Brief description  ^  Detailed description  ^ Tags in the corpus ^ Tool  ^
-^ Belarusian |  ✔  |   ✔    [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://universaldependencies.org/be/index.html#morphology|in English]]%%****%%)  |   [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_be&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] +^ Belarusian |  ✔  |   ✔    [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://universaldependencies.org/be/index.html#morphology|in English]]%%****%%)  |   [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_be&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] 
-^ Bulgarian |  ✔  |   ✔    [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]]    [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/BTB-TR03_BulTreeBank_morphosyntactic_tag.pdf|in English]]  |   [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_bg&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Bulgarian |  ✔  |   ✔    [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]]    [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/BTB-TR03_BulTreeBank_morphosyntactic_tag.pdf|in English]]  |   [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_bg&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Catalan |  ✔  |  ✔  |  [[http://clic.ub.edu/corpus/webfm_send/18|in English]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_ca&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Catalan |  ✔  |  ✔  |  [[http://clic.ub.edu/corpus/webfm_send/18|in English]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_ca&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Chinese |  ✔  |    |  [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]]  |  [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_zh&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]] +^ Chinese |  ✔  |    |  [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]]  |  [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_zh&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]] 
-^ Croatian |  ✔  |  ✔  |   [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]]  |  [[http://nlp.ffzg.hr/data/tagging/msd-hr.html|in English]]    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_hr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]]   | +^ Croatian |  ✔  |  ✔  |   [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]]  |  [[http://nlp.ffzg.hr/data/tagging/msd-hr.html|in English]]    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_hr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]]   | 
-^ Czech |  ✔  |  ✔  |  [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] |  [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_cs&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] +^ Czech |  ✔  |  ✔  |  [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] |  [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_cs&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] 
-^ Dutch |  ✔  |  ✔    |   [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]]  |    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_nl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Dutch |  ✔  |  ✔    |   [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]]  |    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_nl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ English |  ✔    ✔  |  [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]]  | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_en&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ English |  ✔    ✔  |  [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]]  | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_en&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Estonian |  ✔  |  ✔  |  [[http://www.cl.ut.ee/korpused/morfliides/seletus|in Estonian and English]]  |       [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_et&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Estonian |  ✔  |  ✔  |  [[http://www.cl.ut.ee/korpused/morfliides/seletus|in Estonian and English]]  |       [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_et&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Finnish |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/finntreebank|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%)  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_fi&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] +^ Finnish |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/finntreebank|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%)  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_fi&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] 
-^ French |  ✔  |  ✔  |  [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html|in English]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_fr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ French |  ✔  |  ✔  |  [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html|in English]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_fr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ German |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]] %%**%%)  |  [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_de&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] +^ German |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]] %%**%%)  |  [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_de&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] 
-^ Hungarian |  ✔  |        |  [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_hu&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] +^ Hungarian |  ✔  |        |  [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_hu&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] 
-^ Icelandic |  ✔  |  ✔  |  [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]]    [[http://nlp.cs.ru.is/pdf/Tagset.pdf|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_is&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] +^ Icelandic |  ✔  |  ✔  |  [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]]    [[http://nlp.cs.ru.is/pdf/Tagset.pdf|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_is&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] 
-^ Italian |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]]        [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_it&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Italian |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]]        [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_it&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Japanese |  ✔  |  ✔  |  [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]]        [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_ja&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] +^ Japanese |  ✔  |  ✔  |  [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]]        [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_ja&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] 
-^ Latvian |  ✔  |  ✔  |   [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_lv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] +^ Latvian |  ✔  |  ✔  |   [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_lv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] 
-^ Norwegian |  ✔  |  ✔  |  [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]]  |    |    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_no&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/noklesta/The-Oslo-Bergen-Tagger|Oslo-Bergen Tagger]] +^ Norwegian |  ✔  |  ✔  |  [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]]  |    |    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_no&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/noklesta/The-Oslo-Bergen-Tagger|Oslo-Bergen Tagger]] 
-^ Polish |  ✔  |  ✔  |  [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]]  |  [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_pl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://sgjp.pl/morfeusz/|Morfeusz]], [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] +^ Polish |  ✔  |  ✔  |  [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]]  |  [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_pl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://sgjp.pl/morfeusz/|Morfeusz]], [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] 
-^ Portuguese |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_pt&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Portuguese |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_pt&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Russian |  ✔  |  ✔  |  [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] %%***%%)  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_ru&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Russian |  ✔  |  ✔  |  [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] %%***%%)  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_ru&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Slovak |  ✔  |  ✔  |  [[http://korpus.sk/morpho.html/|in Slovak]] and [[https://korpus.sk/morpho_en.html/|English]]  |  [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] +^ Slovak |  ✔  |  ✔  |  [[http://korpus.sk/morpho.html/|in Slovak]] and [[https://korpus.sk/morpho_en.html/|English]]  |  [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_sk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] 
-^ Slovene |  ✔  |  ✔  |    [[http://nl.ijs.si/jos/msd/html-en/josMSD-en.html|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] +^ Slovene |  ✔  |  ✔  |    [[http://nl.ijs.si/jos/msd/html-en/josMSD-en.html|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_sl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] 
-^ Serbian |  ✔  |  ✔  |  [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]]  |   [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]]    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]]   | +^ Serbian |  ✔  |  ✔  |  [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]]  |   [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]]    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_sr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]]   | 
-^ Spanish |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_es&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Spanish |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_es&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Swedish |  ✔  |  ✔  |  [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]]        [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_sv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] +^ Swedish |  ✔  |  ✔  |  [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]]        [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_sv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] 
-^ Ukrainian |  ✔  |  ✔  |  |  [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v13_uk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]]  |+^ Ukrainian |  ✔  |  ✔  |  |  [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_uk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]]  |
  
  
Line 235: Line 233:
 When citing a specific part of InterCorp please use the reference displayed in KonText in the corpus description, e.g. as: When citing a specific part of InterCorp please use the reference displayed in KonText in the corpus description, e.g. as:
  
-Rosen, A., Vavřín, M., Zasina, A. J. (2022). //The InterCorp Corpus – Czech((Insert languages actually used.)), version 14 of 17 January 2022//. Institute of the Czech National Corpus, Charles University, Prague 2020. Available on-line: https://kontext.korpus.cz/+Rosen, A., Vavřín, M., Zasina, A. J. (2022). //The InterCorp Corpus – Czech((Insert languages actually used.)), version 14 of 31 January 2022//. Institute of the Czech National Corpus, Charles University, Prague 2022. Available on-line: https://kontext.korpus.cz/
  
 </WRAP> </WRAP>