Taggers & lemmatizers
Notice: The list is not kept up to date (last update 2/2015).
Ahmet Akker (tool) | Apertium (tool) | BTTtagger (tool) | COMPOST (tool) | Freeling (tool) | MYSTEM (tool) | NLTK (tool) | RDRPOSTagger (tool) | RFTagger (tool) | Stanford (tool) | Treetagger (tool) | Other | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Arabic | x | x | Madamira (web, tool) | |||||||||
Asturian | x | x | ||||||||||
Belarusian | x | |||||||||||
Bengali | x | |||||||||||
Bulgarian | x | x | x | DCL (tool) | ||||||||
Catalan | x | x | x | |||||||||
Chinese | x | |||||||||||
Croatian | x | x | Nikola Ljubešić (tool) | |||||||||
Czech | x | x | x | x | MorphoDiTa (tool) | |||||||
Danish | x | x | CST (web, tool) | |||||||||
Dutch | x | x | x | x | x | x | Brill-NL (web, tool), Frog (tool) | |||||
English | x | x | x | x | x | x | MorphoDiTa (tool) | |||||
Estonian | x | x | ||||||||||
Finnish | x | OMorFi (tool) | ||||||||||
French | x | x | x | x | x | x | x | |||||
Galician | x | x | x | |||||||||
German | x | x | x | x | x | |||||||
Greek | ILSP (web) | |||||||||||
Hebrew | x | MILA (tool) | ||||||||||
Hindi | x | x | x | Siva Reddy (tool), Hindi Shallow Parser (web) | ||||||||
Hungarian | x | x | hunpos (tool) | |||||||||
Icelandic | x | x | IceStagger (tool) | |||||||||
Indonesian | x | |||||||||||
Italian | x | x | x | x | x | x | ||||||
Japanese | x | mecab (tool) | ||||||||||
Lao | x | |||||||||||
Macedonian | x | x | ||||||||||
Maltese | x | Maltese Language Resource Server (web) | ||||||||||
Malay | ||||||||||||
Marathi | x | |||||||||||
Mongolian | x | |||||||||||
Norwegian | x | obt (tool) | ||||||||||
Persian | hazm (tool) | |||||||||||
Polish | x | x | x | TaKIPI, Pantera, Concraft, WCRFT (tools)1) | ||||||||
Portuguese | x | x | x | x | x | |||||||
Romanian | x | x | RACAI (web) | |||||||||
Russian | x | x | x | x | x | |||||||
Serbian | x | x | Nikola Ljubešić (tool) | |||||||||
Slovak | x | x | Morče (tool) | |||||||||
Slovene | x | x | ToTaLe (tool) | |||||||||
Spanish | x | x | x | x | x | x | ||||||
Swahili | x | |||||||||||
Swedish | x | x | x | Stagger (tool) | ||||||||
Telugu | x | |||||||||||
Thai | x | |||||||||||
Turkish | x | ITU Turkish Natural Language Processing Pipeline (web), Trmorph (tool, MA) | ||||||||||
Ukrainian | x | x | ugtag (tool) | |||||||||
Vietnamese | x | vnTagger, Vietnamese Language and Speech Processing (VLSP) / VietTagger (tools) | ||||||||||
Welsh | x | x |
Note: The list does not include tools without a disambiguation component, such as morphological analyzers Ajka or Majka.
For additional resources see Wiki of the Association for Computational Linguistics – List of resources by language and list of tools of varied coverage for more languages.
Tools currently used in InterCorp, the parallel section of the Czech National Corpus, are underlined.
— Alexandr Rosen & corpora@uib.no subscribers
1)
For more tools see Language Tools and Resources for Polish.