Toto je starší verze dokumentu!
Taggers & lemmatizers
Ahmet Akker (tool) | BTTtagger (tool) | COMPOST (tool) | Freeling (tool) | MYSTEM (tool) | NLTK (tool) | RDRPOSTagger (tool) | RFTagger (tool) | Stanford (tool) | Treetagger (tool) | Other | |
---|---|---|---|---|---|---|---|---|---|---|---|
Arabic | x | Madamira (web service, tool) | |||||||||
Asturian | x | ||||||||||
Belarusian | x | ||||||||||
Bengali | x | ||||||||||
Bulgarian | x | x | DCL (tool) | ||||||||
Catalan | x | x | |||||||||
Chinese | x | ||||||||||
Croatian | x | Nikola Ljubešić (tool) | |||||||||
Czech | x | x | x | x | MorphoDiTa (tool) | ||||||
Danish | x | CST (web, tool) | |||||||||
Dutch | x | x | x | x | x | Brill-NL (web, tool) | |||||
English | x | x | x | x | x | MorphoDiTa (tool) | |||||
Estonian | x | x | |||||||||
Finnish | x | OMorFi (tool) | |||||||||
French | x | x | x | x | x | ||||||
Galician | x | x | |||||||||
German | x | x | x | x | |||||||
Greek | ILSP (web) | ||||||||||
Hebrew | MILA (tool) | ||||||||||
Hindi | x | x | Siva Reddy (tool), Hindi Shallow Parser | ||||||||
Hungarian | x | x | hunpos (tool) | ||||||||
Icelandic | x | IceStagger (tool) | |||||||||
Indonesian | x | ||||||||||
Italian | x | x | x | x | x | ||||||
Japanese | x | mecab (tool) | |||||||||
Lao | x | ||||||||||
Macedonian | x | ||||||||||
Maltese | Maltese Language Resource Server (web) | ||||||||||
Malay | |||||||||||
Marathi | x | ||||||||||
Mongolian | x | ||||||||||
Norwegian | obt (tool) | ||||||||||
Persian | hazm (tool) | ||||||||||
Polish | x | x | x | TaKIPI (tool), Pantera (tool) | |||||||
Portuguese | x | x | x | x | |||||||
Romanian | x | RACAI (web) | |||||||||
Russian | x | x | x | x | |||||||
Serbian | x | Nikola Ljubešić | |||||||||
Slovak | x | x | Morče (tool) | ||||||||
Slovene | x | x | ToTaLe (tool) | ||||||||
Spanish | x | x | x | x | x | ||||||
Swahili | x | ||||||||||
Swedish | x | x | Stagger (tool) | ||||||||
Telugu | x | ||||||||||
Thai | x | ||||||||||
Turkish | ITU Turkish Natural Language Processing Pipeline (web), Trmorph (tool, MA) | ||||||||||
Ukrainian | x | ugtag (tool) | |||||||||
Vietnamese | x | vnTagger (tool), Vietnamese Language and Speech Processing (VLSP) / VietTagger | |||||||||
Welsh | x |
For additional resources see Wiki of the Association for Computational Linguistics – List of resources by language.
Tools of varied coverage for more languages may be found at https://languagetool.org.
The list does not include tools without a disambiguation component, such as morphological analyzers Ajka or Majka.
Tools currently used in InterCorp, the parallel section of the Czech National Corpus, are underlined.
— Alexandr Rosen & corpora@uib.no subscribers