AplikaceAplikace
Nastavení

Toto je starší verze dokumentu!


deWaC

a 1.7 billion word corpus constructed from the Web limiting the crawl to the .de domain and using medium-frequency words from the SudDeutsche Zeitung corpus and basic German vocabulary lists as seeds. The corpus was POS-tagged and lemmatized with the TreeTagger, more information available here.

http://wacky.sslmit.unibo.it/doku.php?id=corpora