Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:cnk:dotko [2015/10/24 11:23] – link fixed, version in English vaclavhorky | en:cnk:dotko [2023/09/27 13:26] (current) – [Corpus of Lower Sorbian] michalkren |
---|
====== Corpus of Lower Sorbian ====== | ====== Corpus of Lower Sorbian ====== |
| |
**DOTKO** (**DO**lnoserbski **T**ekstowy **KO**rpus) is a diachronic corpus of Lower Sorbian, being built at the Cottbus (Chóśebuz) branch of the [[http://www.serbski-institut.de/|Sorbian Institute]]. It largely consists of the Bramborski Casnik texts (a Lower Lusatian newspaper) from 1848--1933. Some of these texts have not yet been proofread; they are usually presented in the original spelling without lemmata and morphological tags, which may complicate querying the corpus. | **DOTKO** v2 (**DO**lnoserbski **T**ekstowy **KO**rpus) is an extended version of the diachronic corpus of Lower Sorbian prepared by the Cottbus-Chóśebuz branch of the [[http://www.serbski-institut.de/|Sorbian Institute]]. It includes the largest part of historical Lower Sorbian prints from the beginning of the 18th century until the complete ban on the public use of Sorbian in 1937. The oldest text in the corpus at this time dates from 1706, the most recent from 1936. A substantial part of it consists of the texts of Bramborski Casnik (Lower Sorbian newspaper) from 1848 to 1933. The texts were obtained using the double-keying method and therefore have a relatively high transcription accuracy. However, the biggest improvement over version 1 is the normalisation and lemmatisation of the texts. With appropriate settings, it is therefore possible to search for forms in today's spelling, while historical spelling forms are still searched accordingly. Morphological tagging has not yet been implemented, which may prove problematic for some specific research questions. |
| |
For further details about the corpus please refer to the webpage http://www.dolnoserbski.de/korpus/. | For more information about the corpus, see [[http://www.dolnoserbski.de/korpus/]]. The texts are also part of the Lower Sorbian Digital Library, more information can be found at [[https://www.dolnoserbski.de/biblioteka/informacije/]]. |
| |
DOTKO is a //non-reference// corpus that is planned to be continuously improved, extended and updated in the future. Another notice relates to its availability: for technical reasons, the DOTKO corpus is not included in the standard corpus list for Bonito 1; it is only available via the web interface. | |
| |
| DOTKO v2 is a //non-reference// corpus, and there are plans to continuously improve, expand and update it in the future. |
| |
===== Citing DOTKO ===== | ===== Citing DOTKO ===== |
<WRAP round tip 30%> | <WRAP round tip 30%> |
Serbski Institut, Oddělení dolnolužickosrbského výzkumu Chotěbuz: DOTKO: dolnolužický textový korpus, version 1 from 20 Dec 2010. Ústav Českého národního korpusu FF UK, Praha 2010. Available on-line: <http://www.korpus.cz>. | Serbski Institut, Oddělení dolnolužickosrbského výzkumu Chotěbuz: DOTKO: dolnolužický textový korpus, version 2 from 27 Sep 2023. Ústav Českého národního korpusu FF UK, Praha 2023. Available on-line: <http://www.korpus.cz>. |
</WRAP> | </WRAP> |
| |