AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
en:cnk:intercorp:verze10 [2017/09/05 14:47] – [Morphosyntactic annotation] alexandrrosenen:cnk:intercorp:verze10 [2017/12/15 21:15] – [InterCorp Release 10] alexandrrosen
Line 2: Line 2:
 ====== InterCorp Release 10 ====== ====== InterCorp Release 10 ======
  
- 
- 
-<WRAP right> 
 ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^ ^ Name ^^ Czech -- core ^ Czech -- collections ^ other -- core ^ other -- collections ^
 ^ Positions ^ Number of tokens |  127,413,531 |  118,069,703 |  311,809,130 |  1,551,411,225 | ^ Positions ^ Number of tokens |  127,413,531 |  118,069,703 |  311,809,130 |  1,551,411,225 |
Line 15: Line 12:
 ^ ::: ^ publication date |  2017  ^^^^ ^ ::: ^ publication date |  2017  ^^^^
 ^ ::: ^ foreign languages |  39  ^^^^ ^ ::: ^ foreign languages |  39  ^^^^
-^ ::: ^ tagged languages |  24  ^^^^ +^ ::: ^ tagged languages |  23  ^^^^ 
-^ ::: ^ lemmatized languages |  23  ^^^^ +^ ::: ^ lemmatized languages |  22  ^^^^
-</WRAP> +
  
 ===== Access to the texts ===== ===== Access to the texts =====
Line 120: Line 115:
 ^ Language ^ Tags ^ Lemmas ^ Brief description ^ Detailed description ^ Tool ^ ^ Language ^ Tags ^ Lemmas ^ Brief description ^ Detailed description ^ Tool ^
 ^ Bulgarian |  ✔  |   ✔    |      [[http://www.bultreebank.org/TechRep/BTB-TR03.pdf|in English]]  |  [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  | ^ Bulgarian |  ✔  |   ✔    |      [[http://www.bultreebank.org/TechRep/BTB-TR03.pdf|in English]]  |  [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  |
 +^ Catalan |  ✔  |  ✔  |  [[http://clic.ub.edu/corpus/webfm_send/18|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 ^ Croatian |  ✔  |  ✔  |   [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]]  |      [[https://github.com/uzh/reldi|ReLDI Tagger]]   | ^ Croatian |  ✔  |  ✔  |   [[https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping|in English]]  |      [[https://github.com/uzh/reldi|ReLDI Tagger]]   |
-^ Czech |  ✔  |  ✔  |  [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|in English]]((There is a helper application to assist you with queries including Czech morphological tags. Click [[http://utkl.ff.cuni.cz/~skoumal/morfo/?lang=en|here]].)) |  [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]]  |  [[http://ufal.mff.cuni.cz/morce/index.php|Morče]]  |+^ Czech |  ✔  |  ✔  |  [[http://wiki.korpus.cz/doku.php/seznamy:tagy|in Czech]] and [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html|English]]  |  [[http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf|in English]]  |  [[http://ufal.mff.cuni.cz/morce/index.php|Morče]]  |
 ^ Dutch |  ✔  |   ✔    |   [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]]  |  [[http://www.inl.nl/tst-centrale/images/stories/producten/documentatie/ehc_handleiding_nl.pdf|in Dutch]]  |  [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  | ^ Dutch |  ✔  |   ✔    |   [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/dutch-tagset.txt|in English]]  |  [[http://www.inl.nl/tst-centrale/images/stories/producten/documentatie/ehc_handleiding_nl.pdf|in Dutch]]  |  [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  |
 ^ English |  ✔    ✔  |  [[https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html|in English]]  | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]]  |  [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  | ^ English |  ✔    ✔  |  [[https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html|in English]]  | [[http://utkl.ff.cuni.cz/%7Erosen/public/Penn-Treebank-Tagset.pdf|in English]] + [[http://utkl.ff.cuni.cz/%7Erosen/public/PennTagAdd.html|additions]]  |  [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  |
 ^ Estonian |  ✔  |  ✔  |  [[http://www.cl.ut.ee/korpused/morfliides/seletus| in Estonian and English]]  |      [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  | ^ Estonian |  ✔  |  ✔  |  [[http://www.cl.ut.ee/korpused/morfliides/seletus| in Estonian and English]]  |      [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  |
-^ Finnish |  ✔  |  ✔  |     |  [[http://home.gna.org/omorfi/omorfi/omorfi_user.html|in English]] *)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]] |+^ Finnish |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/finntreebank/|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/sources/FinnTreeBankManual.pdf|in English]]%%*%%)  |  [[http://www.ling.helsinki.fi/kieliteknologia/tutkimus/omor/omorfi/README.shtml|OMorFi]] +[[https://code.google.com/archive/p/hunpos/|HunPOS]]  |
 ^ French |  ✔  |  ✔  |  [[http://www.ims.uni-stuttgart.de/%7Eschmid/french-tagset.html|in English]]  |      [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  | ^ French |  ✔  |  ✔  |  [[http://www.ims.uni-stuttgart.de/%7Eschmid/french-tagset.html|in English]]  |      [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  |
-^ German |  ✔  |  ✔  |  [[http://www.sketchengine.co.uk/documentation/wiki/tagsets/german_rftagger|in English]]((Within a single morphological tag a colon rather than period is used as a separator of the individual categories, e.g. ADJA:Pos:Nom:Sg:Fem.))   [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] +^ German |  ✔  |  ✔  |  [[https://www.sketchengine.co.uk/German-rftagger-part-of-speech-tagset/|in English]]%%**%%   [[http://utkl.ff.cuni.cz/%7Erosen/public/stts_guide.pdf|in German]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]] 
-^ Hungarian |  ✔  |         |  [[http://utkl.ff.cuni.cz/%7Erosen/public/kr_for_ldc.pdf|in English]]  |  [[http://code.google.com/p/hunpos/|HunPos]]  |+^ Hungarian |  ✔  |         [[http://nl.ijs.si/ME/Vault/V3/msd/html/msd.html#SECTION05400000000000000000|in English]]  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/|RFTagger]]  |
 ^ Icelandic |  ✔  |  ✔  |  [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]]        [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]]  | ^ Icelandic |  ✔  |  ✔  |  [[http://www.malfong.is/files/ot_tagset_files_en.pdf|in English]]        [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|IceStagger]]  |
 ^ Italian |  ✔  |  ✔  |  [[ftp://ftp.ims.uni-stuttgart.de/corpora/italian-tagset.txt|in English]]  |      [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  | ^ Italian |  ✔  |  ✔  |  [[ftp://ftp.ims.uni-stuttgart.de/corpora/italian-tagset.txt|in English]]  |      [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  |
 ^ Latvian |  ✔  |  ✔  |   [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]]  |      [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]]  | ^ Latvian |  ✔  |  ✔  |   [[http://www.semti-kamols.lv/doc_upl/TagSet.html|in Latvian]]  |      [[https://peteris.rocks/blog/latvian-part-of-speech-tagging|LVTagger]]  |
-^ Lithuanian |  ✔  |  ✔  |  [[http://utkl.ff.cuni.cz/~skoumal/CZ-LT-CORP/tags.html|in Czech and English]]  |  [[http://utkl.ff.cuni.cz/~skoumal/CZ-LT-CORP/LT-POS.pdf|in English]]  |  Author: [[http://senas.vdu.lt/staff/informatics/CVPDF/CV_Daudaravicius_en.pdf|Vidas Daudaravičius]]  | 
 ^ Norwegian |  ✔  |  ✔  | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]] |      [[https://visl.sdu.dk/remoting.html|VISL]]  | ^ Norwegian |  ✔  |  ✔  | [[http://tekstlab.uio.no/obt-ny/english/tagset.html|in English]] and [[http://tekstlab.uio.no/obt-ny/index.html|Norwegian]] |      [[https://visl.sdu.dk/remoting.html|VISL]]  |
 ^ Polish |  ✔  |  ✔  |  [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]]  |  [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]]  |  [[http://sgjp.pl/morfeusz/|Morfeusz]], [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]]  | ^ Polish |  ✔  |  ✔  |  [[http://nkjp.pl/poliqarp/help/ense2.html#x3-20002|in English]] and [[http://nkjp.pl/poliqarp/help/plse2.html#x3-20002|Polish]]  |  [[http://nlp.ipipan.waw.pl/%7Eadamp/Papers/2003-eacl-ws12/|in English]]  |  [[http://sgjp.pl/morfeusz/|Morfeusz]], [[http://nlp.pwr.wroc.pl/takipi/|TaKIPI]]  |
 ^ Portuguese |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]]  |      [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  | ^ Portuguese |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Portuguese-Tagset.html|in Spanish]]  |      [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  |
-^ Russian |  ✔  |  ✔  |  [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]] ((Tags in the corpus do not always correspond to those listed in the detailed description. Some morphological categories are omitted in the corpus tags, e.g. pronouns are always tagged only as "P-". All tags, as used in ther corpus, are listed in the brief description.))   |  [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  |+^ Russian |  ✔  |  ✔  |  [[http://corpus.leeds.ac.uk/mocky/ru-table.tab|in English]]  |  [[http://nl.ijs.si/ME/V4/msd/html/msd-ru.html|in English]]%%***%%  |  [[http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treetagger.html|TreeTagger]]  |
 ^ Slovak |  ✔  |  ✔  |  [[http://korpus.sk/morpho.html/|in Slovak]]  |  [[http://korpus.sk/attachments/publications/2004-garabik-gianitsova-horak-simkova-tokenizacia.pdf|in Slovak]]  |  [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]]  | ^ Slovak |  ✔  |  ✔  |  [[http://korpus.sk/morpho.html/|in Slovak]]  |  [[http://korpus.sk/attachments/publications/2004-garabik-gianitsova-horak-simkova-tokenizacia.pdf|in Slovak]]  |  [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Radovan Garabík, Morče]]  |
 ^ Slovene |  ✔  |  ✔  |  [[http://nl.ijs.si/ME/V4/msd/html/msd.msds-sl.html|in English and Slovene]]    [[http://nl.ijs.si/ME/V4/msd/html/msd-sl.introduction.html|in English]]  |  [[http://nl2.ijs.si/analyze/|ToTaLe]]  | ^ Slovene |  ✔  |  ✔  |  [[http://nl.ijs.si/ME/V4/msd/html/msd.msds-sl.html|in English and Slovene]]    [[http://nl.ijs.si/ME/V4/msd/html/msd-sl.introduction.html|in English]]  |  [[http://nl2.ijs.si/analyze/|ToTaLe]]  |
 ^ Serbian |  ✔  |  ✔  |   [[http://nl.ijs.si/ME/V4/msd/html/msd.msds-sr.html|in English]]  |      [[https://github.com/uzh/reldi|ReLDI Tagger]]   | ^ Serbian |  ✔  |  ✔  |   [[http://nl.ijs.si/ME/V4/msd/html/msd.msds-sr.html|in English]]  |      [[https://github.com/uzh/reldi|ReLDI Tagger]]   |
-^ Spanish |  ✔  |  ✔  |  [[ftp://ftp.ims.uni-stuttgart.de/corpora/spanish-tagset.txt|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |+^ Spanish |  ✔  |  ✔  |  [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/spanish-tagset.txt|in English]]  |      [[http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/|TreeTagger]]  |
 ^ Swedish |  ✔  |  ✔  |  [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]]        [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]]  | ^ Swedish |  ✔  |  ✔  |  [[http://spraakbanken.gu.se/korp/markup/msdtags.html|in Swedish and English]]        [[http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986|Stagger]]  |
  
  
-<wrap lo>%%*%%)The corpus includes tags in a condensed form, e.g. V:Sg:Nom:Act:PrfPrc:Pos corresponds to [POS=V] [NUM=SG] [CASE=NOM] [VOICE=ACT] [PCP=PRFPRC] [CMP=POS]. Similarly, Pron:Pers:Sg:Ade:Up corresponds to [POS=PRON] [SUBCAT:PERS] [NUM:SG] [CASE=ADE] [CASECHANGE=UP].))  +<wrap lo>%%*%%) The corpus includes tags in a condensed form, e.g. V:Sg:Nom:Act:PrfPrc:Pos corresponds to [POS=V] [NUM=SG] [CASE=NOM] [VOICE=ACT] [PCP=PRFPRC] [CMP=POS]. Similarly, Pron:Pers:Sg:Ade:Up corresponds to [POS=PRON] [SUBCAT:PERS] [NUM:SG] [CASE=ADE] [CASECHANGE=UP].</wrap> 
 + 
 +<wrap lo>%%**%%Within a single morphological tag a colon rather than period is used as a separator of the individual categories, e.g. ADJA:Pos:Nom:Sg:Fem.</wrap> 
 + 
 +<wrap lo>%%***%%Tags in the corpus do not always correspond to those listed in the detailed description. Some morphological categories are omitted in the corpus tags, e.g. pronouns are always tagged only as "P-". All tags, as used in ther corpus, are listed in the brief description.</wrap>   
 +  
 +Tag formats specified in tagset descriptions differ from those actually used in the corpus also in some other languages. Please check the tag format before making a tag query if you are not sure. In a page displaying results open the **View/Corpus-specific settings...** menu to check the //tag// option in the **Positional attributes** box and choose the //for each token// option in the **Viewing options** box.
  
 Queries including contracted forms into tagged or lemmatized texts may fail. This includes forms such as //can't// or //I'm//, which are split by the tagger into two parts (//ca//+//n't// and //I//+//'m//) with corresponding lemmas and tags. Similarly with Polish forms //byłam// or //gdybyś// (//była//+//m// and //gdyby//+//ś//). Tokenization may even introduce errors: //gdzie ś za Wisłą//. In this context, //gdzieś// is not a contraction. A query intended to find the whole contracted form should be typed in as a **Phrase**, with the split parts separated by a space. Only the individual parts of the contracted form are assigned a tag and a lemma. Queries including contracted forms into tagged or lemmatized texts may fail. This includes forms such as //can't// or //I'm//, which are split by the tagger into two parts (//ca//+//n't// and //I//+//'m//) with corresponding lemmas and tags. Similarly with Polish forms //byłam// or //gdybyś// (//była//+//m// and //gdyby//+//ś//). Tokenization may even introduce errors: //gdzie ś za Wisłą//. In this context, //gdzieś// is not a contraction. A query intended to find the whole contracted form should be typed in as a **Phrase**, with the split parts separated by a space. Only the individual parts of the contracted form are assigned a tag and a lemma.
Line 156: Line 157:
 | |doc.version|version|number| | |doc.version|version|number|
 | |doc.wordcount|document size in words|number| | |doc.wordcount|document size in words|number|
-|div|div.id|text identification|author's_last_name-shortened_title / _ACQUIS / _EUROPARL / _PRESSEUROP / _SUBTITLES / _SYNDICATE| +|div|div.id|text identification|author's_last_name-shortened_title / _ACQUIS / _EUROPARL / _PRESSEUROP / _SUBTITLES / _SYNDICATE / _BIBLE 
-| |div.group|division in|//Core// / Acquis / Europarl / PressEurop / Subtitles / Syndicate|+| |div.group|division in|//Core// / Acquis / Europarl / PressEurop / Subtitles / Syndicate / Bible |
 | |div.wordcount|number of words|number| | |div.wordcount|number of words|number|
 | |div.author|author|last name, first name| | |div.author|author|last name, first name|
Line 164: Line 165:
 | |div.pubplace|publication place|text| | |div.pubplace|publication place|text|
 | |div.pubyear|publication year|date| | |div.pubyear|publication year|date|
-| |div.txtype|text type|discussions - transcripts / drama / fiction / journalism - commentaries / journalism - news / legal texts / nonfiction / other / poetry / subtitles|+| |div.txtype|text type|discussions - transcripts / drama / fiction / journalism - commentaries / journalism - news / legal texts / nonfiction / other / poetry / subtitles / religious |
 | |div.original|is the text an original?|Yes / No| | |div.original|is the text an original?|Yes / No|
 | |div.srclang|language of the original|ar / as / az / be / bg / bl / bn / bo / bs / bt / ca / cr / cs / ct / cz / da / de / dk / eb / el / en / es / et / eu / fa / fi / fr / ga / gr / he / hi / hr / hu / hy / id / ie / is / it / ja / ka / ko / ku / lt / lv / mk / mn / ms / mt / my / ni / nl / no / pl / po / ps / pt / rm / rn / ro / ru / se / sk / sl / sq / sr / sv / ta / th / ti / tl / tr / tu / uk / un / ur / vi / zh| | |div.srclang|language of the original|ar / as / az / be / bg / bl / bn / bo / bs / bt / ca / cr / cs / ct / cz / da / de / dk / eb / el / en / es / et / eu / fa / fi / fr / ga / gr / he / hi / hr / hu / hy / id / ie / is / it / ja / ka / ko / ku / lt / lv / mk / mn / ms / mt / my / ni / nl / no / pl / po / ps / pt / rm / rn / ro / ru / se / sk / sl / sq / sr / sv / ta / th / ti / tl / tr / tu / uk / un / ur / vi / zh|
Line 172: Line 173:
 |p|p.id|unique paragraph identifier|text| |p|p.id|unique paragraph identifier|text|
 |s|s.id|unique sentence identifier|text| |s|s.id|unique sentence identifier|text|
- 
  
 ===== Acknowledgements ===== ===== Acknowledgements =====
Line 179: Line 179:
  
 ==== Texts: ==== ==== Texts: ====
 +  * The latest (13th corrected) issue of the Czech Ecumenical Translation of the Bible could be included to the corpus thanks to the [[http://www.dumbible.cz|Czech Biblical Society]], especially Petr Fryš.
   * Fiction in many Slavic and some other languages from [[http://www.uva.nl/over-de-uva/organisatie/medewerkers/content/b/a/a.a.barentsen/a.a.barentsen.html#tab_3|ASPAC – Amsterdam Slavic Parallel Aligned Corpus]] – with special thanks to  Adrian Barentsen   * Fiction in many Slavic and some other languages from [[http://www.uva.nl/over-de-uva/organisatie/medewerkers/content/b/a/a.a.barentsen/a.a.barentsen.html#tab_3|ASPAC – Amsterdam Slavic Parallel Aligned Corpus]] – with special thanks to  Adrian Barentsen
   * Political commentaries in a number of languages from the site [[http://www.project-syndicate.org/|Project Syndicate]]   * Political commentaries in a number of languages from the site [[http://www.project-syndicate.org/|Project Syndicate]]
Line 192: Line 192:
   * Norwegian texts from the publishers [[http://www.aschehoug.no/|Aschehoug &amp; co.]], [[http://www.cappelendamm.no/|Cappelen Forlag]] and [[http://www.oktober.no/|Forlaget Oktober]]   * Norwegian texts from the publishers [[http://www.aschehoug.no/|Aschehoug &amp; co.]], [[http://www.cappelendamm.no/|Cappelen Forlag]] and [[http://www.oktober.no/|Forlaget Oktober]]
   * Film subtitles from the database [[http://www.opensubtitles.org|Open Subtitles]]    * Film subtitles from the database [[http://www.opensubtitles.org|Open Subtitles]] 
- 
 ==== Pre-processing ==== ==== Pre-processing ====
  
Line 208: Line 207:
   * [[http://code.google.com/p/hunpos/|HunPOS]] for Hungarian and other languages   * [[http://code.google.com/p/hunpos/|HunPOS]] for Hungarian and other languages
   * [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík)   * [[http://conference.ui.sav.sk/wikt2010/papers/01_garabik_f.pdf|Tagger for Slovak]] (thanks to Radovan Garabík)
-  * Tagger for Lithuanian (thanks to Vidas Daudaravičius and Hana Skoumalová)  
   * [[http://omilia.uio.no/obt/|Tagger]] for Norwegian (thanks to Pavel Vondřička)   * [[http://omilia.uio.no/obt/|Tagger]] for Norwegian (thanks to Pavel Vondřička)
   * [[http://nl2.ijs.si/analyze/|totale]] for Slovene (thanks to Tomaž Erjavec)   * [[http://nl2.ijs.si/analyze/|totale]] for Slovene (thanks to Tomaž Erjavec)