This is an old revision of the document!

Corpus of monologues: ORATOR

The corpus ORATOR contains monologues by native Czech speakers. Speaking conditions are known in advance and the speaker can prepare. The spokesperson has a predetermined time-space in which he or she can and must create his or her speech. Data of this type has not yet been available in spoken Czech corpora. Similarly, as in the corpus

Transkripční pravidla, způsob propojení se zvukem a většina metadat jsou stejné jako v korpusu ORTOFON, stejným postupem jako korpusy ORAL a ORTOFON je korpus ORATOR i lemmatizován a morfologicky značkován. Není vyvažován podle žádného z kritérií a v roce 2020 se plánuje jeho rozšíření. The corpus is lemmatized morphologically tagged in the same manner as the ORAL corpus and ORTOFON corpus, the transcription is linked to the corresponding audio track.

Name	ORATOR
Number of positions (tokens)	736 407
Number of positions (tokens) without puctuation, hesitations and interjections	578 398
Number of word forms (word)	60 952
Number of conversations recorded	318
Number of utterances	68 727
Number of unique (different) speakers	332
Length of recordings [hh:mm:ss.ms]	72:07:47.368

Trace: • orator

Corpus of monologues: ORATOR

Search

Navigation

Print/export

Tools

Languages

Licence