no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | en:cnk:oral2008 [2015/10/23 19:35] (current) – created Václav Horký | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ~~NOTOC~~ | ||
+ | ====== Corpus of Spoken Czech ORAL2008 ====== | ||
+ | <WRAP right 35%> | ||
+ | ^ <fs medium> | ||
+ | ^ Number of positions (tokens) | 1 349 536 | | ||
+ | ^ Number of positions (tokens) without punctuation and other marks | 1 000 097 | | ||
+ | ^ Number of word forms (words) | 65 778 | | ||
+ | ^ Number of recordings of dialogues | 297 | | ||
+ | ^ Number of utterances | 106 941 | | ||
+ | ^ Number of speakers | 995 | | ||
+ | ^ Length of recordings in mins. | 6883 | | ||
+ | </ | ||
+ | |||
+ | **ORAL2008** is another spoken corpus available within the framework of the Czech National Corpus project. Its aim is appropriate representation of authentic spoken language. The corpus is built from material recorded in the whole of Bohemia in 2002--2007 using the same repository of recordings and their transcriptions as its predecessor, | ||
+ | |||
+ | ORAL2008 is compiled from transcriptions of 297 recordings. All of the recordings were made in informal situations, which means the speakers knew each other and had friendly relationships. The total length of recordings is 6 883 minutes, that is almost 115 hours, and they contain a total of 1 000 097 words uttered by 995 speakers. | ||
+ | |||
+ | The recordings were made and the transcriptions carried out by students of Prague and regional universities, | ||
+ | |||
+ | --- //Martina Waclawičová// | ||
+ | |||
+ | |||
+ | ===== Citing ORAL2008 ===== | ||
+ | |||
+ | <WRAP round tip 70%> | ||
+ | Waclawičová, | ||
+ | |||
+ | Waclawičová, | ||
+ | </ | ||
+ | |||
+ | ===== See also ===== | ||
+ | |||
+ | <WRAP round box 49%> | ||
+ | [[en: | ||
+ | </ |