This is an old revision of the document!

ParlCorp: Corpus of Czech Parliamentary Speeches

The ParlCorp is a corpus of speeches delivered in the Lower Chamber of the Czech Parliament (Poslanecká Sněmovna). The core of the corpus is constituted by electronic transcripts of parliamentary debates, publicly available at www.psp.cz. The aim of the corpus is to make parliamentary speeches accessible to linguistic research, but also to researchers in humanities and social sciences.

Name		Parlcorp
Positions	Number of positions (tokens)	38 591 592
	Number of word forms	310 694
	Number of lemmas	98 820
Structures	Number of speeches <sp>	166 754
	Number of speakers women	239
	Number of speakers men	1004
	Number of sentences <s>	1 750 728
Further information	Reference corpus	NO (verze 2)
Further information	Publication year	2021

The texts also include brief remarks made in response to previous speakers or comments by the Chair of the session. The corpus thus covers a broad spectrum of parliamentary subgenres (parliamentary interpellations or Question Time, statements by the prime minister and government ministers, speeches during parliamentary deliberations, comments by the Chair of the session).