This is an old revision of the document!
Corpus of monologues: ORATOR
The corpus ORATOR contains monologues by native Czech speakers. Speaking conditions are known in advance and the spokesperson can prepare. The spokesperson has a predetermined time-space in which he or she can and must create his or her speech. Data of this type has not yet been available in spoken Czech corpora.
Transcription rules, linking to the corresponding audio track and most metadata are the same as in the corpus ORTOFON and the corpus ORAL. The corpus is lemmatized morphologically tagged in the same manner as the ORAL corpus and ORTOFON corpus. The corpus in not balanced.
Name | ORATOR |
---|---|
Number of positions (tokens) | 736 407 |
Number of positions (tokens) without puctuation, hesitations and interjections | 578 398 |
Number of word forms (word) | 60 952 |
Number of conversations recorded | 318 |
Number of utterances | 68 727 |
Number of unique (different) speakers | 332 |
Length of recordings [hh:mm:ss.ms] | 72:07:47.368 |