AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
en:cnk:intercorp:verze15 [2022/11/23 14:06] – [Corpus size in thousands of words] alexandrrosenen:cnk:intercorp:verze15 [2024/04/18 13:43] – [Morphosyntactic annotation] jankocek
Line 99: Line 99:
  
 ^  Language  ^  Tags  ^  Lemmas  ^  Brief description  ^  Detailed description  ^ Tags in the corpus ^ Tool  ^ ^  Language  ^  Tags  ^  Lemmas  ^  Brief description  ^  Detailed description  ^ Tags in the corpus ^ Tool  ^
-^ Belarusian |  ✔  |   ✔    [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://universaldependencies.org/be/index.html#morphology|in English]]%%****%%)  |   [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_be&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] +^ Belarusian |  ✔  |   ✔    [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://universaldependencies.org/be/index.html#morphology|in English]]%%****%%)  |   [[https://www.korpus.cz/kontext/wordlist/result?q=~ju0ayEyoeIOi|list]]  | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]] 
-^ Bulgarian |  ✔  |   ✔    [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]]    [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/BTB-TR03_BulTreeBank_morphosyntactic_tag.pdf|in English]]  |   [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_bg&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Bulgarian |  ✔  |   ✔    [[https://www.sketchengine.eu/bulgarian-treebank-part-of-speech-tagset/|in English]]    [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/BTB-TR03_BulTreeBank_morphosyntactic_tag.pdf|in English]]  |   [[https://www.korpus.cz/kontext/wordlist/result?q=~b6IUUoMyUs8O|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Catalan |  ✔  |  ✔  |  [[http://clic.ub.edu/corpus/webfm_send/18|in English]]     |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_ca&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Catalan |  ✔  |  ✔  |  [[http://clic.ub.edu/corpus/webfm_send/18|in English]]   [[http://clic.ub.edu/corpus/webfm_send/18|anglicky]]  |    [[https://www.korpus.cz/kontext/wordlist/result?q=~cOI6eWQG0c8O|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Chinese |  ✔  |    |  [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]]  |  [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_zh&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]]  +^ Chinese |  ✔  |    |  [[https://www.sketchengine.eu/chinese-penn-treebank-part-of-speech-tagset/|in English]]  |  [[https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports|in English]]  |  [[https://www.korpus.cz/kontext/wordlist/result?q=~uwCay4cSYSy2|list]]  | [[https://www.sutd.edu.sg/cmsresource/faculty/yuezhang/zpar.html|ZPar v0.7.5]]   
-^ Croatian |  ✔  |  ✔  |   [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]]  |  [[http://nlp.ffzg.hr/data/tagging/msd-hr.html|in English]]    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_hr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]]   | +^ Croatian |  ✔  |  ✔  |   [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]]  |  [[http://nlp.ffzg.hr/data/tagging/msd-hr.html|in English]]    [[https://www.korpus.cz/kontext/wordlist/result?q=~CeqE4wiqmIoA|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]]   | 
-^ Czech |  ✔  |  ✔  |  [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] |  [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_cs&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] +^ Czech |  ✔  |  ✔  |  [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]] |  [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]]  |  [[https://www.korpus.cz/kontext/wordlist/result?q=~wK68uwI0uWiW|list]]  | [[http://ufal.mff.cuni.cz/morce/index.php|Morče]] 
-^ Dutch |  ✔  |  ✔    |   [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]]  |    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_nl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Dutch |  ✔  |  ✔    |   [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]]  |    [[https://www.korpus.cz/kontext/wordlist/result?q=~KSoiyk0CuCCc|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ English |  ✔    ✔  |  [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]]  | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_en&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ English |  ✔    ✔  |  [[http://utkl.ff.cuni.cz/~rosen/INTERCORP/TAGSETS/PennTreebankTags.pdf|in English]]  | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]]  |  [[https://www.korpus.cz/kontext/wordlist/result?q=~SYU20meuus0a|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Estonian |  ✔  |  ✔  |  [[http://www.cl.ut.ee/korpused/morfliides/seletus|in Estonian and English]]  |       [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_et&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Estonian |  ✔  |  ✔  |  [[http://www.cl.ut.ee/korpused/morfliides/seletus|in Estonian and English]]  |       [[https://www.korpus.cz/kontext/wordlist/result?q=~mWSCSIKm8OcY|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Finnish |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/finntreebank|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%)  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_fi&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] +^ Finnish |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/finntreebank|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%)  |  [[https://www.korpus.cz/kontext/wordlist/result?q=~6iw6q2e06KcI|list]]  |[[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] 
-^ French |  ✔  |  ✔  |  [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html|in English]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_fr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ French |  ✔  |  ✔  |  [[https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/french-tagset.html|in English]]  |      [[https://www.korpus.cz/kontext/wordlist/result?q=~m6aC4MMkssms|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ German |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]] %%**%%)  |  [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_de&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] +^ German |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]] %%**%%)  |  [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]]  |  [[https://www.korpus.cz/kontext/wordlist/result?q=~u4ISOKym04am|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] 
-^ Hungarian |  ✔  |        |  [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_hu&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] +^ Hungarian |  ✔  |        |  [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|in English]]  |  [[https://www.korpus.cz/kontext/wordlist/result?q=~jSyOE2A2KKsQ|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] 
-^ Icelandic |  ✔  |  ✔  |  [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]]    [[http://nlp.cs.ru.is/pdf/Tagset.pdf|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_is&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] +^ Icelandic |  ✔  |  ✔  |  [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]]    [[http://nlp.cs.ru.is/pdf/Tagset.pdf|in English]]  |  [[https://www.korpus.cz/kontext/wordlist/result?q=~bEoEKqasyiEe|list]]  | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]] 
-^ Italian |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]]        [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_it&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Italian |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/italian-tagset.txt|in English]]        [[https://www.korpus.cz/kontext/wordlist/result?q=~fmIIwaQqWGqm|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Japanese |  ✔  |  ✔  |  [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]]        [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_ja&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] +^ Japanese |  ✔  |  ✔  |  [[https://www.sketchengine.eu/tagset-jp-mecab/|in English]]        [[https://www.korpus.cz/kontext/wordlist/result?q=~hIOk8CYaIMqm|list]]  | [[https://taku910.github.io/mecab/|MeCab]] + [[https://unidic.ninjal.ac.jp|Unidic]] 
-^ Latvian |  ✔  |  ✔  |   [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_lv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] +^ Latvian |  ✔  |  ✔  |   [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]]  |      [[https://www.korpus.cz/kontext/wordlist/result?q=~GeQ8SSOCouq0|list]]  | [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]] 
-^ Norwegian |  ✔  |  ✔  |  [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]]  |    |    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_no&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/noklesta/The-Oslo-Bergen-Tagger|Oslo-Bergen Tagger]]  | +^ Norwegian |  ✔  |  ✔  |  [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://universaldependencies.org/no/index.html#morphology|in English]]%%****%%)     [[https://www.korpus.cz/kontext/wordlist/result?q=~EcIww4ecGgOG|list]]  | [[https://web.archive.org/web/20170122231904/http://lindat.mff.cuni.cz/services/udpipe/api-reference.php|UDPipe]]  | 
-^ Polish |  ✔  |  ✔  |  [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]]  |  [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_pl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://sgjp.pl/morfeusz/|Morfeusz]], [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] +^ Polish |  ✔  |  ✔  |  [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]]  |  [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]]  |  [[https://www.korpus.cz/kontext/wordlist/result?q=~McUUoI6EwKaC|list]]  |[[http://sgjp.pl/morfeusz/|Morfeusz]], [[https://github.com/kwrobel-nlp/krnnt|KRNNT]] 
-^ Portuguese |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_pt&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Portuguese |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]]  |      [[https://www.korpus.cz/kontext/wordlist/result?q=~Fis6w6WSYqYg|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Russian |  ✔  |  ✔  |  [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] %%***%%)  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_ru&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Russian |  ✔  |  ✔  |  [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] %%***%%)  |  [[https://www.korpus.cz/kontext/wordlist/result?q=~Ymey666Kk0qe|list]]  |[[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Slovak |  ✔  |  ✔  |  [[http://korpus.sk/morpho.html/|in Slovak]] and [[https://korpus.sk/morpho_en.html/|English]]  |  [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_sk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] +^ Slovak |  ✔  |  ✔  |  [[http://korpus.sk/morpho.html/|in Slovak]] and [[https://korpus.sk/morpho_en.html/|English]]  |  [[https://korpus.sk/attachments/morpho_en/tagset-www.pdf|in Slovak]]  |  [[https://www.korpus.cz/kontext/wordlist/result?q=~mKMiKqM6CqO2|list]]  | [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]] 
-^ Slovene |  ✔  |  ✔  |    [[http://nl.ijs.si/jos/msd/html-en/josMSD-en.html|in English]]  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_sl&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] +^ Slovene |  ✔  |  ✔  |    [[http://nl.ijs.si/jos/msd/html-en/josMSD-en.html|in English]]  |  [[https://www.korpus.cz/kontext/wordlist/result?q=~FkkKukIsmeue|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]] 
-^ Serbian |  ✔  |  ✔  |  [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]]  |   [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]]    [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_sr&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]]   | +^ Serbian |  ✔  |  ✔  |  [[https://www.sketchengine.eu/multext-east-serbian-part-of-speech-tagset/|in English]]  |   [[http://nl.ijs.si/ME/V4/msd/html/msd-sr.html|in English]]    [[https://www.korpus.cz/kontext/wordlist/result?q=~bGMCy2o2EwOM|list]]  | [[https://github.com/clarinsi/reldi-tagger|ReLDI Tagger]]   | 
-^ Spanish |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]]  |      [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_es&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] +^ Spanish |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]]  |      [[https://www.korpus.cz/kontext/wordlist/result?q=~mQYWIgi6yIK4|list]]  | [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]] 
-^ Swedish |  ✔  |  ✔  |  [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]]        [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_sv&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] +^ Swedish |  ✔  |  ✔  |  [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]]        [[https://www.korpus.cz/kontext/wordlist/result?q=~tcGEoMWww0oC|list]]  | [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]] 
-^ Ukrainian |  ✔  |  ✔  |  |  [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://kontext.korpus.cz/wordlist/result?wlnums=frq&wlpat=.*&blhash=&include_nonwords=0&wlsort=f&corpname=intercorp_v14_uk&wlattr=tag&usesubcorp=&wlminfreq=1&wlhash=&wlpage=1|list]]  | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]]  |+^ Ukrainian |  ✔  |  ✔  |  [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)   [[http://universaldependencies.org/docs/u/pos/index.html|in English]]%%****%%)  |  [[https://www.korpus.cz/kontext/wordlist/result?q=~IKEKEIm2Auug|list]]  | [[http://ufal.mff.cuni.cz/udpipe/2|UDPipe]]  |
  
  
Line 141: Line 141:
 Queries including contracted forms into tagged or lemmatized texts may fail. This includes forms such as //can't// or //I'm//, which are split by the tagger into two parts (//ca//+//n't// and //I//+//'m//) with corresponding lemmas and tags. Similarly with Polish forms //byłam// or //gdybyś// (//była//+//m// and //gdyby//+//ś//). Tokenization may even introduce errors: //gdzie ś za Wisłą//. In this context, //gdzieś// is not a contraction. A query intended to find the whole contracted form should be typed in as a **Phrase**, with the split parts separated by a space. Only the individual parts of the contracted form are assigned a tag and a lemma. Queries including contracted forms into tagged or lemmatized texts may fail. This includes forms such as //can't// or //I'm//, which are split by the tagger into two parts (//ca//+//n't// and //I//+//'m//) with corresponding lemmas and tags. Similarly with Polish forms //byłam// or //gdybyś// (//była//+//m// and //gdyby//+//ś//). Tokenization may even introduce errors: //gdzie ś za Wisłą//. In this context, //gdzieś// is not a contraction. A query intended to find the whole contracted form should be typed in as a **Phrase**, with the split parts separated by a space. Only the individual parts of the contracted form are assigned a tag and a lemma.
  
-Morphological tags including characters with a special meaning in regular expressions, e.g. "%%$%%" in the English tag "wp%%$%%", must be preceded in queries by a backslash: tag="wp\$".+Morphological tags including characters with a special meaning in regular expressions, e.g. ''$'' in the English tag ''wp%%$%%'', must be preceded in queries by a backslash: ''tag=%%"wp\$"%%''.
 =====Structural attributes===== =====Structural attributes=====
  
Line 210: Line 210:
   * [[http://code.google.com/p/hunpos/|HunPOS]] for Hungarian and other languages   * [[http://code.google.com/p/hunpos/|HunPOS]] for Hungarian and other languages
   * [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík)   * [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík)
-  * [[http://omilia.uio.no/obt/|Tagger]] for Norwegian (thanks to Pavel Vondřička) 
   * [[http://nl2.ijs.si/analyze/|totale]] for Slovene (until Release 11, thanks to Tomaž Erjavec)   * [[http://nl2.ijs.si/analyze/|totale]] for Slovene (until Release 11, thanks to Tomaž Erjavec)
   * [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] for German   * [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] for German
Line 233: Line 232:
 When citing a specific part of InterCorp please use the reference displayed in KonText in the corpus description, e.g. as: When citing a specific part of InterCorp please use the reference displayed in KonText in the corpus description, e.g. as:
  
-Rosen, A., Vavřín, M., Zasina, A. J. (2022). //The InterCorp Corpus – Czech((Insert languages actually used.)), version 14 of 31 January 2022//. Institute of the Czech National Corpus, Charles University, Prague 2022. Available on-line: https://kontext.korpus.cz/+Rosen, A., Vavřín, M., Zasina, A. J. (2022). //The InterCorp Corpus – Czech((Insert languages actually used.)), version 15 of 11 November 2022//. Institute of the Czech National Corpus, Charles University, Prague 2022. Available on-line: https://kontext.korpus.cz/
  
 </WRAP> </WRAP>