This is an old revision of the document!
ParlCorp: Corpus of Czech Parliamentary Speeches
The ParlCorp is a corpus of speeches delivered in the Lower Chamber of the Czech Parliament (Poslanecká Sněmovna). The core of the corpus is constituted by electronic transcripts of parliamentary debates, publicly available at www.psp.cz. The aim of the corpus is to make parliamentary speeches accessible to linguistic research, but also to researchers in humanities and social sciences.
Name | Parlcorp | |
---|---|---|
Positions | Number of positions (tokens) | 38 591 592 |
Number of word forms | 310 694 | |
Number of lemmas | 98 820 | |
Structures | Number of speeches <sp> | 166 754 |
Number of speakers women | 239 | |
Number of speakers men | 1004 | |
Number of sentences <s> | 1 750 728 | |
Further information | Reference corpus | NO (verze 2) |
Publication year | 2021 |
The texts also include brief remarks made in response to previous speakers or comments by the Chair of the session. The corpus thus covers a broad spectrum of parliamentary subgenres (parliamentary interpellations or Question Time, statements by the prime minister and government ministers, speeches during parliamentary deliberations, comments by the Chair of the session).
Available metadata
Two types of metadata are available for each speech: information about the text and information about the speaker.
Text metadata
- Electoral period – 8 electoral periods (1993-1996, … ,2017-2021)
- Number of the parliamentary session
- Date of the session
- Topic of the deliberation
- Unique ID of the text
Speaker metadata
- name (e.g. Taťána Fischerová)
- sex – woman, man
- role – member of parliament, Prime Minister, government ministr…
- party affiliation – political party or movement
- intervention order within the discussed topic