AplikaceAplikace
Nastavení

Toto je starší verze dokumentu!


ukWaC:

a 2 billion word corpus constructed from the Web limiting the crawl to the .uk domain and using medium-frequency words from the BNC as seeds. The corpus was POS-tagged and lemmatized with the TreeTagger. The tagset is available here, more information can be found in this paper.

http://wacky.sslmit.unibo.it/doku.php?id=corpora