Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |
en:manualy:korpusdb [2021/02/15 10:59] – jankocek | en:manualy:korpusdb [2021/02/15 11:15] (current) – jankocek |
---|
====== KorpusDB: Database of word forms and lemmas attested in the CNC corpora ====== | ====== KorpusDB: Database of word forms and lemmas attested in the CNC corpora ====== |
| |
{{ :manualy:korpusdb_logo.png?direct&200|}} | {{ :manualy:korpusdb_logo.png?nolink&200|}} |
| |
The database contains all recognized word forms of all lemmata that actually occur in any of the processed CNC corpora: [[cnk:syn:verze8|SYN v8]] (contemporary written Czech), [[cnk:oral|ORAL v1]] and [[cnk:ortofon|ORTOFON v1]] (contemporary spoken Czech), [[cnk:diakorp|DIAKORP v6]] and an unpublished corpus of 19th century texts. Since their lemmatization and POS-tagging may differ, internal versions of these corpora have been processed, using a common tagging. | The database contains all recognized word forms of all lemmata that actually occur in any of the processed CNC corpora: [[cnk:syn:verze8|SYN v8]] (contemporary written Czech), [[cnk:oral|ORAL v1]] and [[cnk:ortofon|ORTOFON v1]] (contemporary spoken Czech), [[cnk:diakorp|DIAKORP v6]] and an unpublished corpus of 19th century texts. Since their lemmatization and POS-tagging may differ, internal versions of these corpora have been processed, using a common tagging. |