Taggers & lemmatizers

Notice: The list is not kept up to date (last update 2/2015).

Ahmet Akker (tool)Apertium (tool)BTTtagger (tool)COMPOST (tool)Freeling (tool)MYSTEM (tool)NLTK (tool)RDRPOSTagger (tool)RFTagger (tool)Stanford (tool)Treetagger (tool)Other
Arabic x x Madamira (web, tool)
Asturian x x
Belarusian x
Bengali x
Bulgarian x x x DCL (tool)
Catalan x x x
Chinese x
Croatian x x Nikola Ljubešić (tool)
Czech x x x x MorphoDiTa (tool)
Danish x x CST (web, tool)
Dutch x x x x x x Brill-NL (web, tool), Frog (tool)
English x x x x x x MorphoDiTa (tool)
Estonian x x
Finnish x OMorFi (tool)
French x x x x x x x
Galician x x x
German x x x x x
Greek ILSP (web)
Hebrew x MILA (tool)
Hindi x x x Siva Reddy (tool), Hindi Shallow Parser (web)
Hungarian x x hunpos (tool)
Icelandic x x IceStagger (tool)
Indonesian x
Italian x x x x x x
Japanese x mecab (tool)
Lao x
Macedonian x x
Maltese x Maltese Language Resource Server (web)
Malay
Marathi x
Mongolian x
Norwegian x obt (tool)
Persian hazm (tool)
Polish x x x TaKIPI, Pantera, Concraft, WCRFT (tools)1)
Portuguese x x x x x
Romanian x x RACAI (web)
Russian x x x x x
Serbian x x Nikola Ljubešić (tool)
Slovak x x Morče (tool)
Slovene x x ToTaLe (tool)
Spanish x x x x x x
Swahili x
Swedish x x x Stagger (tool)
Telugu x
Thai x
Turkish x ITU Turkish Natural Language Processing Pipeline (web), Trmorph (tool, MA)
Ukrainian x x ugtag (tool)
Vietnamese x vnTagger, Vietnamese Language and Speech Processing (VLSP) / VietTagger (tools)
Welsh x x

Note: The list does not include tools without a disambiguation component, such as morphological analyzers Ajka or Majka.

For additional resources see Wiki of the Association for Computational Linguistics – List of resources by language and list of tools of varied coverage for more languages.

Tools currently used in InterCorp, the parallel section of the Czech National Corpus, are underlined.

Alexandr Rosen & corpora@uib.no subscribers