Brno Spoken Corpus

Name BMK
Number of positions (tokens) 596 009
Number of positions (tokens) without punctuation and other marks 500 4601)
Number of word forms (words) 39 615
Number of recordings of dialogues 250
Number of utterances 27 921
Number of speakers 294
Length of recordings in mins. N/A

The Brno Spoken Corpus (BMK) is the first corpus of spoken Czech from Moravian regions as part of the Czech National Corpus. It records authentic spoken language in the city of Brno and is not thematically specialised. The BMK is an electronic transcript of 250 anonymous recordings from 1994–1999, capturing 294 speakers.

The considerable variety of spoken Czech from the Brno region reflects the complexity of the social structure of the city, the central position of Brno within Moravia (the mixing of inhabitants from the entire region, which is still dialectically rather differentiated, occurs here) and also the territorial closeness to Bohemia. In the everyday speech of Brno inhabitants, in particular the central-Moravian interdialect and the spreading colloquial Czech whose features are similar to those of the traditional dialect of the city area are mixed; vocabulary reflects evident relics of former coinhabitance of the Brno Czech with the German language and the influence of the Brno slang (hantec). The spoken language in Brno also reverberates the Moravia wide tendency towards wider functional use of standard Czech.

The BMK was created in compliance with the PMK principles, which means that it covers all four socio-linguistic variables in balanced proportions: the speaker's gender, age, education and type of speech. The recordings strove for a representative occurrence of all combinations. The gender is marked by abbreviations MZ (malefemale), age is marked by IV (IuniorVetus), that is older and younger, with the lower limit being c. 20 years of age (the language of adolescent youth is not fully stabilised) and the dividing line between them was the age of c. 35. The education is marked by abbreviations BA (BasisAltus), that is the lower, including both elementary and secondary education, and the higher, that is university education. The last variable, represented by abbreviations FN, represent formal and informal speech. The BMK contains 135 formal recordings and 115 informal recordings. The formal speech is a monologue created by a succession of replies to questions from a uniform list. These covered such broad topics as school, youth, work or family. The questions were asked in the standard language code (unlike those in the PMK) and were neither recorded nor transcribed (only rare and individual clarifications by the interviewer were transcribed and marked as T). The informal speeches were conversations of two or more people who know each otherwell and chose the topics of their conversations themselves. One of the participants was usually also a respondent in the formal speech recordings, which enables us to observe the differences between the Czech used in unofficial and semi-official situations. In many cases the recordings were made with a hidden microphone (their publication was approved by the speakers after that), so they guarantee a large extent of authenticity and spontaneity in the use of language.

The authors of the BMK are in different proportions mainly Zdeňka Hladká, Dana Hlaváčková, Daniel Jedlička and Táňa Vykypělová from the Faculty of Arts, Masaryk University in Brno; although many students of the Faculty of Arts and the Faculty of Social Studies, Masaryk University also helped with the recordings.

Zdeňka Hladká (the BMK guarantor)

Citing BMK

Hladká, Z.: BMK (Brněnský mluvený korpus): přepisy nahrávek brněnské mluvy z 90.let 20. století. Ústav Českého národního korpusu FF UK, Praha 2002. Available on-line: http://www.korpus.cz.

See also