Toto je starší verze dokumentu!
Taggers & lemmatizers
Ahmet Akker (tool) | Apertium (tool) | BTTtagger (tool) | COMPOST (tool) | Freeling (tool) | MYSTEM (tool) | NLTK (tool) | RDRPOSTagger (tool) | RFTagger (tool) | Stanford (tool) | Treetagger (tool) | Other | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Arabic | x | x | Madamira (web, tool) | |||||||||
Asturian | x | |||||||||||
Belarusian | x | |||||||||||
Bengali | x | |||||||||||
Bulgarian | x | x | x | DCL (tool) | ||||||||
Catalan | x | x | x | |||||||||
Chinese | x | |||||||||||
Croatian | x | x | Nikola Ljubešić (tool) | |||||||||
Czech | x | x | x | x | MorphoDiTa (tool) | |||||||
Danish | x | CST (web, tool) | ||||||||||
Dutch | x | x | x | x | x | x | Brill-NL (web, tool), Frog (tool) | |||||
English | x | x | x | x | x | x | MorphoDiTa (tool) | |||||
Estonian | x | x | ||||||||||
Finnish | x | OMorFi (tool) | ||||||||||
French | x | x | x | x | x | x | ||||||
Galician | x | x | ||||||||||
German | x | x | x | x | x | |||||||
Greek | ILSP (web) | |||||||||||
Hebrew | MILA (tool) | |||||||||||
Hindi | x | x | x | Siva Reddy (tool), Hindi Shallow Parser (web) | ||||||||
Hungarian | x | x | hunpos (tool) | |||||||||
Icelandic | x | IceStagger (tool) | ||||||||||
Indonesian | x | |||||||||||
Italian | x | x | x | x | x | |||||||
Japanese | x | mecab (tool) | ||||||||||
Lao | x | |||||||||||
Macedonian | x | x | ||||||||||
Maltese | x | Maltese Language Resource Server (web) | ||||||||||
Malay | ||||||||||||
Marathi | x | |||||||||||
Mongolian | x | |||||||||||
Norwegian | obt (tool) | |||||||||||
Persian | hazm (tool) | |||||||||||
Polish | x | x | x | TaKIPI, Pantera, Concraft, WCRFT (tools)1) | ||||||||
Portuguese | x | x | x | x | ||||||||
Romanian | x | x | RACAI (web) | |||||||||
Russian | x | x | x | x | ||||||||
Serbian | x | x | Nikola Ljubešić (tool) | |||||||||
Slovak | x | x | Morče (tool) | |||||||||
Slovene | x | x | ToTaLe (tool) | |||||||||
Spanish | x | x | x | x | x | |||||||
Swahili | x | |||||||||||
Swedish | x | x | Stagger (tool) | |||||||||
Telugu | x | |||||||||||
Thai | x | |||||||||||
Turkish | x | ITU Turkish Natural Language Processing Pipeline (web), Trmorph (tool, MA) | ||||||||||
Ukrainian | x | x | ugtag (tool) | |||||||||
Vietnamese | x | vnTagger, Vietnamese Language and Speech Processing (VLSP) / VietTagger (tools) | ||||||||||
Welsh | x |
For additional resources see Wiki of the Association for Computational Linguistics – List of resources by language.
Tools of varied coverage for more languages may be found at https://languagetool.org.
The list does not include tools without a disambiguation component, such as morphological analyzers Ajka or Majka.
Tools currently used in InterCorp, the parallel section of the Czech National Corpus, are underlined.
— Alexandr Rosen & corpora@uib.no subscribers
1)
For more tools see Language Tools and Resources for Polish.