ParlCorp: Corpus of Czech Parliamentary Speeches

The ParlCorp is a corpus of speeches delivered in the Lower Chamber of the Czech Parliament (Poslanecká Sněmovna). The core of the corpus is constituted by electronic transcripts of parliamentary debates, publicly available at www.psp.cz. The aim of the corpus is to make parliamentary speeches accessible to linguistic research, but also to researchers in humanities and social sciences.

Name Parlcorp
Positions Number of positions (tokens) 38 591 592
Number of word forms 310 694
Number of lemmas 98 820
Structures Number of speeches <sp> 166 754
Number of speakers women 239
Number of speakers men 1004
Number of sentences <s> 1 750 728
Further information Reference corpus NO (verze 2)
Time period 1993–2021
Publication year 2021

The texts also include brief remarks made in response to previous speakers or comments by the Chair of the session. The corpus thus covers a broad spectrum of parliamentary subgenres (parliamentary interpellations or Question Time, statements by the prime minister and government ministers, speeches during parliamentary deliberations, comments by the Chair of the session etc.).

Available metadata

Two types of metadata are available for each speech: information about the text and information about the speaker.

Text metadata

  • Electoral period – 8 electoral periods (1993–1996, …, 2017–2021)
  • Number of the parliamentary session
  • Date of the session
  • Topic of the deliberation
  • Unique ID of the text

Speaker metadata

  • name – for example Taťána Fischerová
  • sex – woman, man
  • role – for example member of parliament, Prime Minister, government minister etc.
  • party affiliation – political party or movement
  • intervention order within the discussed topic

Citing Parlcorp

Berrocal, Martina – Berrocal, Manuel: ParlCorp: Corpus of Czech Parliamentary Speeches. Ústav Českého národního korpusu FF UK, Praha 2021. Available on-line: http://www.korpus.cz