Corpus of Upper Sorbian

HOTKO (HOrnjoserbski Tekstowy KOrpus) is a corpus of Upper Sorbian, being built at the Sorbian Institute in Bautzen (Budyšin). It consists of journalistic, fiction, religious and scientific texts from half of the 19th century until today. The most part of the corpus consists of journalistic texts (57 %) and fiction (23 %), a number of dictionaries are included as well (12 %). Regarding the time periods covered, more than half of the texts come from recent period after the political change in 1989/1990 (54 %). Most of the texts have been scanned and OCR'ed, but not proofread. Minor part of the corpus is presented in the original spelling. The corpus is neither morphologically annotated nor lemmatized, which may complicate querying the corpus.

For further details about the corpus please refer to the webpage http://www.serbski-institut.de/cms/os/48/hornjoserbski.

HOTKO is a non-reference corpus that is planned to be continuously improved, extended and updated in the future.

Currently, the newest version of HOTKO is release 2 published in March 2021.

Citing HOTKO

Serbski Institut Budyšín: HOTKO: hornolužický textový korpus, version 2 from 6 Mar 2021. Ústav Českého národního korpusu FF UK, Praha 2021. Available on-line: <http://www.korpus.cz>.

See also