AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:oral [2017/07/18 14:55] – [Modification of sociolinguistic data] michalkrenen:cnk:oral [2023/11/20 12:35] (current) – [ORAL Corpus] michalkren
Line 1: Line 1:
 ====== ORAL Corpus  ====== ====== ORAL Corpus  ======
-The ORAL corpus is a corpus containing the transcribed recordings of predominantly informal conversations taking place between native speakers of Czech from all regions of the Czech Republic. The speakers knew each other very well (they were either friends or family members) and they were recorded in their natural environment. The recordings were made over the course of ten years, between 2002 and 2011. The corpus is not balanced, with the majority of the data originating from the Bohemia region of the Czech Republic (for more visit the [[en:cnk:struktura_oral|corpus structure]]). There is only one level of transcription, and wherever it was possible, it was unified along with tokenization for all parts of the corpus.+The ORAL corpus is a corpus containing the transcribed recordings of predominantly informal conversations taking place between native speakers of Czech from all regions of the Czech Republic. The speakers knew each other very well (they were either friends or family members) and they were recorded in their natural environment. The recordings were made over the course of ten years, between 2002 and 2011. The corpus is not balanced, with the majority of the data originating from the Bohemia region of the Czech Republic (for more visit the [[cnk:struktura_oral|corpus structure]]; Czech only). There is only one level of transcription, and wherever it was possible, it was unified along with tokenization for all parts of the corpus.
 The ORAL corpus unifies the corpora [[en:cnk:oral2006|ORAL2006]], [[en:cnk:oral2008|ORAL2008]], [[en:cnk:oral2013|ORAL2013]] and the as yet unpublished recordings ORAL-Z. The overall size of the corpus is 5 368 391 words, with a total recording time of 582 hours. Part of the transcripts are not linked to the audio (data from the corpora ORAL2006 and ORAL2008). The corpus is [[en:cnk:lemtag_mluv|lemmatized and morphologically tagged]]. It uses the same type of [[en:seznamy:tagy|morphological tagging]] as the contemporary written corpora. The ORAL corpus unifies the corpora [[en:cnk:oral2006|ORAL2006]], [[en:cnk:oral2008|ORAL2008]], [[en:cnk:oral2013|ORAL2013]] and the as yet unpublished recordings ORAL-Z. The overall size of the corpus is 5 368 391 words, with a total recording time of 582 hours. Part of the transcripts are not linked to the audio (data from the corpora ORAL2006 and ORAL2008). The corpus is [[en:cnk:lemtag_mluv|lemmatized and morphologically tagged]]. It uses the same type of [[en:seznamy:tagy|morphological tagging]] as the contemporary written corpora.
  
Line 10: Line 10:
 ^ Number of [[en:pojmy:atributy_strukturni#struktura_korpusu_mluvene_cestiny|recorded conversations]] |  1 546 | ^ Number of [[en:pojmy:atributy_strukturni#struktura_korpusu_mluvene_cestiny|recorded conversations]] |  1 546 |
 ^ Number of [[en:pojmy:atributy_strukturni#struktura_korpusu_mluvene_cestiny|speaking turns]] |  696 918 | ^ Number of [[en:pojmy:atributy_strukturni#struktura_korpusu_mluvene_cestiny|speaking turns]] |  696 918 |
-^ Number of unique (different) speakers |  1 297 |  +^ Number of speakers |  2 807 |  
 ^ Length of recordings for ORAL2013 + ORAL-Z [hh:mm:ss.ms] |  354:44:36.722 |   ^ Length of recordings for ORAL2013 + ORAL-Z [hh:mm:ss.ms] |  354:44:36.722 |  
 </WRAP> </WRAP>
Line 76: Line 76:
 Kopřivová, M. - Lukeš, D. - Komrsková, Z. - Poukarová, P. - Waclawičová, M. - Benešová, L. – Křen, M.: //ORAL: korpus neformální mluvené češtiny, verze 1 z 2. 6. 2017//. Ústav Českého národního korpusu FF UK, Praha 2017. Retrieved from: http://www.korpus.cz Kopřivová, M. - Lukeš, D. - Komrsková, Z. - Poukarová, P. - Waclawičová, M. - Benešová, L. – Křen, M.: //ORAL: korpus neformální mluvené češtiny, verze 1 z 2. 6. 2017//. Ústav Českého národního korpusu FF UK, Praha 2017. Retrieved from: http://www.korpus.cz
  
-Kopřivová, M. - Lukeš, D. - Komrsková, Z. - Poukarová, P.: Korpus ORAL: sestavení, lemmatizace a morfologické značkování. In //Korpus - Gramatika - Axiologie// 2017 (in print).+Kopřivová, M. - Lukeš, D. - Komrsková, Z. - Poukarová, P. (2017): Korpus ORAL: sestavení, lemmatizace a morfologické značkování. In //Korpus - Gramatika - Axiologie// 15, 47-67.
  
 Lukeš. D. - Klimešová, P. - Komrsková, Z. - Kopřivová, M. (2015) : Experimental Tagging of the ORAL Series Corpora: Insights on Using a Stochastic Tagger. In: //TSD 2015//, Ed. P. Král a V. Matoušek. Springer international Publishing, 342-350. Lukeš. D. - Klimešová, P. - Komrsková, Z. - Kopřivová, M. (2015) : Experimental Tagging of the ORAL Series Corpora: Insights on Using a Stochastic Tagger. In: //TSD 2015//, Ed. P. Král a V. Matoušek. Springer international Publishing, 342-350.