AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
en:cnk:dialekt [2017/07/07 10:29] veronikapojarovaen:cnk:dialekt [2017/07/18 15:00] – [Related links] michalkren
Line 27: Line 27:
 The conversations have a rather informal character, even though the explorators (interviewers) made the recordings with the informers (dialect speakers) in the form of guided interviews – a method used in dialectology. The majority of the transcribed dialect recordings contain a usually unprepared monologue-type speech taking place in a private domestic environment. The topics of the talks usually relate to the traditional rural life and the world at the time and are therefore connected to agriculture, crafts, local customs and traditions, folklore, events of the period etc., e.g. Weaving, About the Cursed Snake, The beginning of World War II. In these talks, dialectisms from all language levels are preserved (phonetic and phonological, morphological, syntactic and lexical). The conversations have a rather informal character, even though the explorators (interviewers) made the recordings with the informers (dialect speakers) in the form of guided interviews – a method used in dialectology. The majority of the transcribed dialect recordings contain a usually unprepared monologue-type speech taking place in a private domestic environment. The topics of the talks usually relate to the traditional rural life and the world at the time and are therefore connected to agriculture, crafts, local customs and traditions, folklore, events of the period etc., e.g. Weaving, About the Cursed Snake, The beginning of World War II. In these talks, dialectisms from all language levels are preserved (phonetic and phonological, morphological, syntactic and lexical).
  
-The dialect corpus also contains an extensive sociolinguistic tagging system, which can be used to create subcorpora, as in the last two tables in the section [[en:pojmy:atributy_strukturni#strukturni_atributy_mluvenych_korpusu|Structural attributes of spoken corpora]]. +The dialect corpus also contains an extensive sociolinguistic tagging system, which can be used to create subcorpora.
  
 ===== Map of dialect regions in CR ===== ===== Map of dialect regions in CR =====
Line 35: Line 34:
 ====== Processing dialect recordings ====== ====== Processing dialect recordings ======
  
-Dialect material in the **DIALEKT** corpus is processed with two transcription tiers – dialectological and orthographic, see [[en:cnk:dialekt:pravidla|transcription principles]]. The basic transcript is dialectological and is based on the rules for the transcription of scientific dialectological texts. The second transcription tier contains the orthographic transcription, which approaches the usual form of written texts and is comparable to the general rules established for spoken corpora in the Czech National Corpus (CNC).+Dialect material in the **DIALEKT** corpus is processed with two transcription tiers – dialectological and orthographic, see [[cnk:dialekt:pravidla|transcription principles]] (Czech only). The basic transcript is dialectological and is based on the rules for the transcription of scientific dialectological texts. The second transcription tier contains the orthographic transcription, which approaches the usual form of written texts and is comparable to the general rules established for spoken corpora in the Czech National Corpus (CNC).
 **DIALEKT** is, similarly to the corpora **[[en:cnk:oral|ORAL]]** and **[[en:cnk:ortofon|ORTOFON]]** [[en:cnk:lemtag_mluv|lemmatized and morphologically tagged]]. Due to the extensive variability of dialect material and insufficient training data sets, the tagging and lemmatization process was extremely complicated, and it is necessary to keep this in mind when considering the outcome. **DIALEKT** is, similarly to the corpora **[[en:cnk:oral|ORAL]]** and **[[en:cnk:ortofon|ORTOFON]]** [[en:cnk:lemtag_mluv|lemmatized and morphologically tagged]]. Due to the extensive variability of dialect material and insufficient training data sets, the tagging and lemmatization process was extremely complicated, and it is necessary to keep this in mind when considering the outcome.
  
-After entering a query in the [[en:manualy:kontext:index|KonText]] interface, we are shown either only one selected transcription tier, or both tiers simultaneously as parallel corpora standing next to each other. It is only up to us to select the primary tier (dialectological or orthographic). This tier then displays all of the corpus functions – it is possible to play parts of the recording by the segment, change settings to display other information, [[en:pojmy:atributy_pozicni|positional]] or [[en:pojmy:atributy_strukturni#strukturni_atributy_mluvenych_korpusu|structural units and attributes]] etc., see [[en:cnk:dialekt:prace|Working with the DIALEKT corpus]].+After entering a query in the [[en:manualy:kontext:index|KonText]] interface, we are shown either only one selected transcription tier, or both tiers simultaneously as parallel corpora standing next to each other. It is only up to us to select the primary tier (dialectological or orthographic). This tier then displays all of the corpus functions – it is possible to play parts of the recording by the segment, change settings to display other information, positional or structural units and attributes etc.
  
 ===== Acknowledgements ===== ===== Acknowledgements =====
Line 62: Line 61:
  
 <WRAP round box 70%> <WRAP round box 70%>
-[[en:cnk:dialekt:pravidla|Transcription in the DIALEKT corpus]] • [[en:cnk:dialekt:prace|Working with the DIALEKT corpus]] • [[en:cnk:ortofon|ORTOFON]] • [[en:cnk:diakorp|DIAKORP]] • [[en:pojmy:synchronni|Synchronic corpora]] • [[en:pojmy:reprezentativnost|Representativity]] • [[en:pojmy:diachronni|Diachrony, diachronic corpora]] • [[en:cnk:struktura#korpusy_mluvene|Spoken corpora]] • [[en:cnk:lemtag_mluv|Lemmatization and tagging in spoken corpora]]+[[en:cnk:ortofon|ORTOFON]] • [[en:cnk:diakorp|DIAKORP]] • [[en:cnk:lemtag_mluv|Lemmatization and tagging in spoken corpora]]
 </WRAP> </WRAP>