AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:dialekt [2021/12/25 01:00] lukesen:cnk:dialekt [2022/01/05 16:01] (current) – [How to cite] martinawaclawicova
Line 17: Line 17:
 The **DIALEKT** corpus presents traditional regional dialects captured over the entire Czech Republic. The dialect material was acquired by transcribing sound recordings coming from all dialectal regions of the Czech Republic. Additionally, several probes were recorded in Poland. The corpus is composed of two levels. The older dialectal level contains recordings which were made in the period from the end of the 1950s until the 1980s. The newer level contains probes covering the period from the 1990s until the present. For both layers, we have language data which capture archaic dialectal elements which do not generally occur in the present day usage.  The **DIALEKT** corpus presents traditional regional dialects captured over the entire Czech Republic. The dialect material was acquired by transcribing sound recordings coming from all dialectal regions of the Czech Republic. Additionally, several probes were recorded in Poland. The corpus is composed of two levels. The older dialectal level contains recordings which were made in the period from the end of the 1950s until the 1980s. The newer level contains probes covering the period from the 1990s until the present. For both layers, we have language data which capture archaic dialectal elements which do not generally occur in the present day usage. 
  
-The first version of the dialect corpus contains approx. 100 000 words and will gradually expand. We assume that it will serve not only for specialists (dialectologists, other linguists and researchers from related fields) but also for example as a practical learning aid for high schools and universities. In the future, it should also be supplemented with interactive maps with dialectal features from the individual regional dialects, excerpts from transcripts and recordings from selected locations, and other useful additions.+The second version of the dialect corpus contains more than 220 000 words and will gradually expand. We assume that it will serve not only for specialists (dialectologists, other linguists and researchers from related fields) but also for example as a practical learning aid for high schools and universities. In the future, it should also be supplemented with interactive maps with dialectal features from the individual regional dialects, excerpts from transcripts and recordings from selected locations, and other useful additions.
  
 ====== Composition of DIALEKT and data collection ====== ====== Composition of DIALEKT and data collection ======
Line 45: Line 45:
 ===== Map of dialect regions in CR ===== ===== Map of dialect regions in CR =====
  
-{{:cnk:oblasti_ridsi_mod2.jpg?direct&500| Map of dialect regions in CR}}+{{:en:cnk:oblasti_ridsi_2021_wiki.png?direct&500| Map of dialect regions in CR}}
 ====== Processing dialect recordings ====== ====== Processing dialect recordings ======
  
Line 66: Line 66:
  
 Goláňová, H. – Waclawičová, M. – Komrsková, Z. – Lukeš, D. – Kopřivová, M. – Poukarová, P.: //DIALEKT: nářeční korpus, verze 1 z 2. 6. 2017//. Ústav Českého národního korpusu FF UK, Praha 2017. Retrieved from: http://www.korpus.cz\\ Goláňová, H. – Waclawičová, M. – Komrsková, Z. – Lukeš, D. – Kopřivová, M. – Poukarová, P.: //DIALEKT: nářeční korpus, verze 1 z 2. 6. 2017//. Ústav Českého národního korpusu FF UK, Praha 2017. Retrieved from: http://www.korpus.cz\\
 +
 +Goláňová, H. – Waclawičová, M. (2019): The DIALEKT corpus and its possibilities. Jazykovedný časopis, 70(2), 336-344. ISSN 0021-5597.
  
 Komrsková, Z. - Kopřivová, M. - Lukeš, D. - Poukarová, P. - Goláňová, H. (2017): New Spoken Corpora of Czech: ORTOFON and DIALEKT. //Jazykovedný časopis//, 68(2), 219-228. ISSN 0021-8897. Komrsková, Z. - Kopřivová, M. - Lukeš, D. - Poukarová, P. - Goláňová, H. (2017): New Spoken Corpora of Czech: ORTOFON and DIALEKT. //Jazykovedný časopis//, 68(2), 219-228. ISSN 0021-8897.
Line 71: Line 73:
 Goláňová, H. (2015): A new dialect corpus: DIALEKT. In Katarína Gajdošová - Adriana Žáková (eds.): //Proceedings of the Eight International Conference Slovko 2015 (Natural Language Processing, Corpus Linguistics, Lexicography)//. Lüdenscheid: RAM-Verlag, 36-44. ISBN 978-3-942303-32-3.\\ Goláňová, H. (2015): A new dialect corpus: DIALEKT. In Katarína Gajdošová - Adriana Žáková (eds.): //Proceedings of the Eight International Conference Slovko 2015 (Natural Language Processing, Corpus Linguistics, Lexicography)//. Lüdenscheid: RAM-Verlag, 36-44. ISBN 978-3-942303-32-3.\\
  
-Goláňová, H. – Kopřivová, M. – Lukeš, D. – Štěpán, M. (2015): Kartografické a geografické zpracování dat z mluvených korpusů. In //Korpus – gramatika – axiologie//, 11, 42-54. ISSN: 1804-137X 
 </WRAP> </WRAP>
- 
-Corpus compilation and project coordination was secured by //Hana Goláňová//, corpus preparation and proofreading of transcription by //Martina Waclawičová//, the orthographic transcription tier by //Zuzana Komrsková//, technical creation of the corpus by //David Lukeš// and lemmatization and morphological tagging was prepared by //Zuzana Komrsková//, //Marie Kopřivová//, //David Lukeš// and //Petra Poukarová//. 
  
 ===== Related links ===== ===== Related links =====