AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
en:cnk:ortofon [2017/07/18 14:33] – [Corpus composition and data collection] Michal Křenen:cnk:ortofon [2017/11/07 18:18] – [How to cite] Michal Křen
Line 39: Line 39:
   * **Multi-tier transcription**: The transcription of spoken language in the ORTOFON corpus was carried out on two tiers: **orthographic** and **phonetic**. The orthographic tier serves primarily to ease the understanding of and orientation in the recorded conversation, whereas the phonetic tier captures the actual realization of the utterance with the aid of a phonetic transcription. These two tiers are supplemented by an additional **metalanguage** tier, which captures the accompanying sounds produced by the speakers (e.g. laughter, coughing) or the present surroundings with a possible influence on the conversation (e.g. the sound of a telephone ringtone can lead to an interruption of the conversation).   * **Multi-tier transcription**: The transcription of spoken language in the ORTOFON corpus was carried out on two tiers: **orthographic** and **phonetic**. The orthographic tier serves primarily to ease the understanding of and orientation in the recorded conversation, whereas the phonetic tier captures the actual realization of the utterance with the aid of a phonetic transcription. These two tiers are supplemented by an additional **metalanguage** tier, which captures the accompanying sounds produced by the speakers (e.g. laughter, coughing) or the present surroundings with a possible influence on the conversation (e.g. the sound of a telephone ringtone can lead to an interruption of the conversation).
   * **Pause punctuation based on pause length**: A section of the [[en:cnk:oral|ORAL]] corpus, specifically ORAL2013 and ORAL-Z, contains a pause punctuation based on the intuitive distinction between shorter and longer pauses based on the speech rate of the specific speaker. In the ORTOFON corpus, three types of pauses are distinguished based on temporal criteria: divides (less than 120 ms), pauses (120 ms - 2 s), long pauses (longer than 2 s).   * **Pause punctuation based on pause length**: A section of the [[en:cnk:oral|ORAL]] corpus, specifically ORAL2013 and ORAL-Z, contains a pause punctuation based on the intuitive distinction between shorter and longer pauses based on the speech rate of the specific speaker. In the ORTOFON corpus, three types of pauses are distinguished based on temporal criteria: divides (less than 120 ms), pauses (120 ms - 2 s), long pauses (longer than 2 s).
-  * **Fully balanced corpus**:  In the ORTOFON corpus, each combination of the four sociolinguistic variables is represented by a group of the same size; compare this to [[en:cnk:oral2013#co_ma_oral2013_s_korpusy_oral2006_a_oral2008_spolecneho|ORAL2013]].+  * **Full balance**:  In the ORTOFON corpus, each combination of the four sociolinguistic variables is represented by a group of the same size (cf. [[en:cnk:oral2013#co_ma_oral2013_s_korpusy_oral2006_a_oral2008_spolecneho|ORAL2013]]).
   * **Varied representation of speakers from all over the Czech Republic**: The demarcation of the individual dialectal regions is based on the dialect divisions used in [[http://cja.ujc.cas.cz/cja.html|Czech language atlas]], however, the borders have been further refined (see [[en:cnk:dialekt#mapa_narecnich_oblasti_cr| the map of dialectal regions]]). During the process of data collection, care was taken to achieve the variability of both the speakers and the municipalities from which they come.   * **Varied representation of speakers from all over the Czech Republic**: The demarcation of the individual dialectal regions is based on the dialect divisions used in [[http://cja.ujc.cas.cz/cja.html|Czech language atlas]], however, the borders have been further refined (see [[en:cnk:dialekt#mapa_narecnich_oblasti_cr| the map of dialectal regions]]). During the process of data collection, care was taken to achieve the variability of both the speakers and the municipalities from which they come.
   * **Extended segment for listening**: The segment of each separate transcript can be as long as 25 words, which improves the experience of listening to the audio segment.   * **Extended segment for listening**: The segment of each separate transcript can be as long as 25 words, which improves the experience of listening to the audio segment.
-  * **Alternative way of marking overlaps**: Overlaps in the transcript are marked with square brackets and are not divided in the audio so that they can be heard better, compared to [[en:cnk:oral2013|ORAL2013]]. +  * **Alternative way of marking overlaps**: Overlaps in the transcript are marked with square brackets and are not divided in the audio so that they can be heard better (cf. [[en:cnk:oral2013|ORAL2013]])
-  * **Audio availability**: The entire ORTOFON corpus is linked with audio tracks, so it is possible to listen to the given concordance (for the corpus [[en:cnk:oral|ORAL]] this only applies to the ORAL-Z and ORAL2013 sections).+  * **Availability of audio**: The entire ORTOFON corpus is linked with audio tracks, so it is possible to listen to the given concordance (for the corpus [[en:cnk:oral|ORAL]] this only applies to the ORAL-Z and ORAL2013 sections).
   * **New metainformation**: The scope of meta information collected regarding the recording and the individual speakers has been extended.   * **New metainformation**: The scope of meta information collected regarding the recording and the individual speakers has been extended.
  
Line 54: Line 54:
 <WRAP round tip 70%> <WRAP round tip 70%>
 Kopřivová, M. – Komrsková, Z. – Lukeš, D. – Poukarová, P. – Škarpová, M.: //ORTOFON: Korpus neformální mluvené češtiny s víceúrovňovým přepisem//. Ústav Českého národního korpusu FF UK, Praha 2017. Retrieved from: http://www.korpus.cz Kopřivová, M. – Komrsková, Z. – Lukeš, D. – Poukarová, P. – Škarpová, M.: //ORTOFON: Korpus neformální mluvené češtiny s víceúrovňovým přepisem//. Ústav Českého národního korpusu FF UK, Praha 2017. Retrieved from: http://www.korpus.cz
 +
 +Komrsková, Z. - Kopřivová, M. - Lukeš, D. - Poukarová, P. - Goláňová, H. (2017): New Spoken Corpora of Czech: ORTOFON and DIALEKT. //Jazykovedný časopis//, 68(2), 219-228. ISSN 0021-8897.
  
 Kopřivová M. – Goláňová H. – Klimešová P. – Komrsková Z. – Lukeš D. (2014): Multi-tier Transcription of Informal Spoken Czech: The ORTOFON Corpus Approach. In //Complex Visibles Out There//. Olomouc: Univerzita Palackého v Olomouci, 529-544. Kopřivová M. – Goláňová H. – Klimešová P. – Komrsková Z. – Lukeš D. (2014): Multi-tier Transcription of Informal Spoken Czech: The ORTOFON Corpus Approach. In //Complex Visibles Out There//. Olomouc: Univerzita Palackého v Olomouci, 529-544.