The ParlCorp is a corpus of speeches delivered in the Lower Chamber of the Czech Parliament (Poslanecká Sněmovna). The core of the corpus is constituted by electronic transcripts of parliamentary debates, publicly available at www.psp.cz. The aim of the corpus is to make parliamentary speeches accessible to linguistic research, but also to researchers in humanities and social sciences.
Name | Parlcorp | |
---|---|---|
Positions | Number of positions (tokens) | 38 591 592 |
Number of word forms | 310 694 | |
Number of lemmas | 98 820 | |
Structures | Number of speeches <sp> | 166 754 |
Number of speakers women | 239 | |
Number of speakers men | 1004 | |
Number of sentences <s> | 1 750 728 | |
Further information | Reference corpus | NO (verze 2) |
Time period | 1993–2021 | |
Publication year | 2021 |
The texts also include brief remarks made in response to previous speakers or comments by the Chair of the session. The corpus thus covers a broad spectrum of parliamentary subgenres (parliamentary interpellations or Question Time, statements by the prime minister and government ministers, speeches during parliamentary deliberations, comments by the Chair of the session etc.).
Two types of metadata are available for each speech: information about the text and information about the speaker.
Berrocal, Martina – Berrocal, Manuel: ParlCorp: Corpus of Czech Parliamentary Speeches. Ústav Českého národního korpusu FF UK, Praha 2021. Available on-line: http://www.korpus.cz