AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
en:cnk:dotko [2015/10/24 11:09] – created Václav Horkýen:cnk:dotko [2023/09/27 13:26] (current) – [Corpus of Lower Sorbian] Michal Křen
Line 2: Line 2:
 ====== Corpus of Lower Sorbian ====== ====== Corpus of Lower Sorbian ======
  
-**DOTKO** (**DO**lnoserbski **T**ekstowy **KO**rpus) is diachronic corpus of Lower Sorbian, being built at the Cottbus (Chóśebuzbranch of the Sorbian Institute. It largely consists of the Bramborski Casnik texts (Lower Lusatian newspaper) from 1848--1933. Some of these texts have not yet been proofread; they are usually presented in the original spelling without lemmata and morphological tags, which may complicate querying the corpus.+**DOTKO** v2 (**DO**lnoserbski **T**ekstowy **KO**rpus) is an extended version of the diachronic corpus of Lower Sorbian prepared by the Cottbus-Chóśebuz branch of the [[http://www.serbski-institut.de/|Sorbian Institute]]. It includes the largest part of historical Lower Sorbian prints from the beginning of the 18th century until the complete ban on the public use of Sorbian in 1937. The oldest text in the corpus at this time dates from 1706, the most recent from 1936. A substantial part of it consists of the texts of Bramborski Casnik (Lower Sorbian newspaper) from 1848 to 1933. The texts were obtained using the double-keying method and therefore have a relatively high transcription accuracy. However, the biggest improvement over version 1 is the normalisation and lemmatisation of the texts. With appropriate settings, it is therefore possible to search for forms in today's spelling, while historical spelling forms are still searched accordingly. Morphological tagging has not yet been implemented, which may prove problematic for some specific research questions.
  
-For further details about the corpus please refer to the webpage http://www.dolnoserbski.de/korpus/. +For more information about the corpus, see [[http://www.dolnoserbski.de/korpus/]]The texts are also part of the Lower Sorbian Digital Librarymore information can be found at [[https://www.dolnoserbski.de/biblioteka/informacije/]].
- +
-DOTKO is a non-reference corpus that is planned to be continuously improved, extended and updated in the future. Another notice relates to its availability: for technical reasonsthe DOTKO corpus is not included in the standard corpus list for Bonito 1; it is only available via the web interface.+
  
 +DOTKO v2 is a //non-reference// corpus, and there are plans to continuously improve, expand and update it in the future.
  
 ===== Citing DOTKO ===== ===== Citing DOTKO =====
 <WRAP round tip 30%> <WRAP round tip 30%>
-Serbski Institut, Oddělení dolnolužickosrbského výzkumu Chotěbuz: DOTKO: dolnolužický textový korpus, verze 1 z 20. 12. 2010. Ústav Českého národního korpusu FF UK, Praha 2010. Available on-line: <http://www.korpus.cz>.+Serbski Institut, Oddělení dolnolužickosrbského výzkumu Chotěbuz: DOTKO: dolnolužický textový korpus, version 2 from 27 Sep 2023. Ústav Českého národního korpusu FF UK, Praha 2023. Available on-line: <http://www.korpus.cz>.
 </WRAP> </WRAP>
  
Line 18: Line 17:
  
 <WRAP round box 49%> <WRAP round box 49%>
-[[cnk:hotko|HOTKO – Corpus of Upper Sorbian]] • [[en:cnk:uvod|CNC corpora]]+[[en:cnk:hotko|HOTKO – Corpus of Upper Sorbian]] • [[en:cnk:uvod|CNC corpora]]
 </WRAP> </WRAP>