Differences

This shows you the differences between two versions of the page.

--- en:cnk:orator [2019/12/19 15:05] – mariekoprivova
+++ en:cnk:orator [2021/03/08 13:26] (current) – [How to cite] zuzanakomrskova
@@ Line 1: / Line 1: @@
 ====== Corpus of monologues: ORATOR ======
-The corpus ORATOR contains monologues by native Czech speakers. Speaking conditions are known in advance and the spokesperson can prepare. The spokesperson has a predetermined time-space in which he or she can and must create his or her speech. Data of this type has not yet been available in spoken Czech corpora.
+<WRAP right 35%>
+^ <fs medium>Name</fs> | <fs medium>[[cnk:orator|ORATOR]]•v1</fs> | <fs medium>[[cnk:orator|ORATOR]]•v2</fs> |
+^ Number of [[pojmy:token|positions (tokens)]] | 736 407 | 1 535 609 |
+^ Number of [[pojmy:token|positions (tokens)]] without puctuation, hesitations and interjections | 578 398 | 1 207 255 |
+^ Number of [[pojmy:word| word forms (word)]] | 60 952 | 97 816 |
+^ Number of [[pojmy:atributy_strukturni#struktura_korpusu_mluvene_cestiny|conversations recorded]] | 318 | 489 |
+^ Number of [[pojmy:atributy_strukturni#struktura_korpusu_mluvene_cestiny|utterances]] | 68 727 | 147 867 |
+^ Number of unique (different) speakers| 332 | 468 |
+^ Length of recordings [hh:mm:ss.ms] | 72:07:47.368 | 148:51:51.56 |
+</WRAP>
-Transcription rules, linking to the corresponding audio track and most metadata are the same as in the corpus [[en:cnk:ortofon|ORTOFON]] and the corpus [[en:cnk:oral|ORAL]]. The corpus is [[en:cnk:lemtag_mluv|lemmatized morphologically tagged]] in the same manner as the ORAL corpus and ORTOFON corpus. The corpus in not balanced.
+The ORATOR corpus contains monologues by native Czech speakers. The typical situations include a lecture, instruction, guided tour, welcome address, sermon etc. The corpus is not balanced in any way. The speech is usually prepared and the speaker has to fit within the given time frame. To our knowledge, there is no corpus with this kind of data available for Czech.
-Attributes for the corpus ORATOR: [[pojmy:atributy_strukturni|website with structural attributes]].
+Transcription rules, linking to the corresponding audio track and most metadata follow the [[en:cnk:ortofon|ORTOFON]] and [[en:cnk:oral|ORAL]] corpora, structural attributes used in ORATOR are described [[pojmy:atributy_strukturni|here]] (Czech only). The corpus is [[en:cnk:lemtag_mluv|lemmatized and morphologically tagged]] in the same way as the ORAL and ORTOFON corpora.
-<WRAP right 35%>
+An updated version 2 of this corpus was published in 2020, with more than twice as much data and featuring many small improvements in the consistency of the transcription and in the annotation of the corpus.
-^ <fs medium>Name</fs> | <fs medium>[[cnk:orator|ORATOR]]</fs> |
-^ Number of [[pojmy:token|positions (tokens)]] | 736 407 |
+===== How to cite =====
-^ Number of [[pojmy:token|positions (tokens)]] without puctuation, hesitations and interjections | 578 398 |
-^ Number of [[pojmy:word| word forms (word)]] | 60 952 |
+<WRAP round tip 70%>
-^ Number of [[pojmy:atributy_strukturni#struktura_korpusu_mluvene_cestiny|conversations recorded]] | 318 |
+Kopřivová, M. – Laubeová, Z. – Lukeš, D. – Poukarová, P.: //ORATOR v2: Korpus monologů//. Ústav Českého národního korpusu FF UK, Praha 2020. Retrieved from [[https://www.korpus.cz]].
-^ Number of [[pojmy:atributy_strukturni#struktura_korpusu_mluvene_cestiny|utterances]] | 68 727 |
-^ Number of unique (different) speakers| 332 |
+Kopřivová, M. – Laubeová, Z. – Lukeš, D. – Poukarová, P.: //ORATOR v1: Korpus monologů//. Ústav Českého národního korpusu FF UK, Praha 2019. Retrieved from [[https://www.korpus.cz]].
-^ Length of recordings [hh:mm:ss.ms] | 72:07:47.368 |
 </WRAP>

Trace:

Differences

Search

Navigation

Print/export

Tools

Languages

Licence