Corpus of Upper Sorbian

HOTKO (HOrnjoserbski Tekstowy KOrpus) is a corpus of Upper Sorbian, being built at the Sorbian Institute in Bautzen (Budyšin). It consists of journalistic, fiction, religious and scientific texts from half of the 19th century until today. The most part of the corpus consists of journalistic texts (57 %) and fiction (23 %), a number of dictionaries are included as well (12 %). Regarding the time periods covered, more than half of the texts come from recent period after the political change in 1989/1990 (54 %). Most of the texts have been scanned and OCR'ed, but not proofread. Minor part of the corpus is presented in the original spelling. The corpus is neither morphologically annotated nor lemmatized, which may complicate querying the corpus.

For further details about the corpus please refer to the webpage

HOTKO is a non-reference corpus that is planned to be continuously improved, extended and updated in the future. Another notice relates to its availability: for technical reasons, the HOTKO corpus is not included in the standard corpus list for Bonito 1; it is only available via the web interface.

Citing HOTKO

Serbski Institut Budyšín: HOTKO: hornolužický textový korpus, version 1 from 6 Mar 2013. Ústav Českého národního korpusu FF UK, Praha 2010. Available on-line: <>.

