Corpus of Lower Sorbian

DOTKO v2 (DOlnoserbski Tekstowy KOrpus) is an extended version of the diachronic corpus of Lower Sorbian prepared by the Cottbus-Chóśebuz branch of the Sorbian Institute. It includes the largest part of historical Lower Sorbian prints from the beginning of the 18th century until the complete ban on the public use of Sorbian in 1937. The oldest text in the corpus at this time dates from 1706, the most recent from 1936. A substantial part of it consists of the texts of Bramborski Casnik (Lower Sorbian newspaper) from 1848 to 1933. The texts were obtained using the double-keying method and therefore have a relatively high transcription accuracy. However, the biggest improvement over version 1 is the normalisation and lemmatisation of the texts. With appropriate settings, it is therefore possible to search for forms in today's spelling, while historical spelling forms are still searched accordingly. Morphological tagging has not yet been implemented, which may prove problematic for some specific research questions.

For more information about the corpus, see http://www.dolnoserbski.de/korpus/. The texts are also part of the Lower Sorbian Digital Library, more information can be found at https://www.dolnoserbski.de/biblioteka/informacije/.

DOTKO v2 is a non-reference corpus, and there are plans to continuously improve, expand and update it in the future.

Citing DOTKO

Serbski Institut, Oddělení dolnolužickosrbského výzkumu Chotěbuz: DOTKO: dolnolužický textový korpus, version 2 from 27 Sep 2023. Ústav Českého národního korpusu FF UK, Praha 2023. Available on-line: <http://www.korpus.cz>.

