Toto je starší verze dokumentu!
Taggers & lemmatizers
| Ahmet Akker (tool) | Apertium (tool) | BTTtagger (tool) | COMPOST (tool) | Freeling (tool) | MYSTEM (tool) | NLTK (tool) | RDRPOSTagger (tool) | RFTagger (tool) | Stanford (tool) | Treetagger (tool) | Other | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Arabic | x | x | Madamira (web service, tool) | |||||||||
| Asturian | x | |||||||||||
| Belarusian | x | |||||||||||
| Bengali | x | |||||||||||
| Bulgarian | x | x | x | DCL (tool) | ||||||||
| Catalan | x | x | x | |||||||||
| Chinese | x | |||||||||||
| Croatian | x | x | Nikola Ljubešić (tool) | |||||||||
| Czech | x | x | x | x | MorphoDiTa (tool) | |||||||
| Danish | x | CST (web, tool) | ||||||||||
| Dutch | x | x | x | x | x | x | Brill-NL (web, tool) | |||||
| English | x | x | x | x | x | MorphoDiTa (tool) | ||||||
| Estonian | x | x | ||||||||||
| Finnish | x | OMorFi (tool) | ||||||||||
| French | x | x | x | x | x | |||||||
| Galician | x | x | ||||||||||
| German | x | x | x | x | ||||||||
| Greek | ILSP (web) | |||||||||||
| Hebrew | MILA (tool) | |||||||||||
| Hindi | x | x | x | Siva Reddy (tool), Hindi Shallow Parser | ||||||||
| Hungarian | x | x | hunpos (tool) | |||||||||
| Icelandic | x | IceStagger (tool) | ||||||||||
| Indonesian | x | |||||||||||
| Italian | x | x | x | x | x | |||||||
| Japanese | x | mecab (tool) | ||||||||||
| Lao | x | |||||||||||
| Macedonian | x | x | ||||||||||
| Maltese | x | Maltese Language Resource Server (web) | ||||||||||
| Malay | ||||||||||||
| Marathi | x | |||||||||||
| Mongolian | x | |||||||||||
| Norwegian | obt (tool) | |||||||||||
| Persian | hazm (tool) | |||||||||||
| Polish | x | x | x | TaKIPI, Pantera, Concraft, WCRFT (tools)1) | ||||||||
| Portuguese | x | x | x | x | ||||||||
| Romanian | x | x | RACAI (web) | |||||||||
| Russian | x | x | x | x | ||||||||
| Serbian | x | x | Nikola Ljubešić | |||||||||
| Slovak | x | x | Morče (tool) | |||||||||
| Slovene | x | x | ToTaLe (tool) | |||||||||
| Spanish | x | x | x | x | x | |||||||
| Swahili | x | |||||||||||
| Swedish | x | x | Stagger (tool) | |||||||||
| Telugu | x | |||||||||||
| Thai | x | |||||||||||
| Turkish | x | ITU Turkish Natural Language Processing Pipeline (web), Trmorph (tool, MA) | ||||||||||
| Ukrainian | x | x | ugtag (tool) | |||||||||
| Vietnamese | x | vnTagger (tool), Vietnamese Language and Speech Processing (VLSP) / VietTagger | ||||||||||
| Welsh | x |
For additional resources see Wiki of the Association for Computational Linguistics – List of resources by language.
Tools of varied coverage for more languages may be found at https://languagetool.org.
The list does not include tools without a disambiguation component, such as morphological analyzers Ajka or Majka.
Tools currently used in InterCorp, the parallel section of the Czech National Corpus, are underlined.
— Alexandr Rosen & corpora@uib.no subscribers
1)
For more tools see Language Tools and Resources for Polish.