Karel Havlíček’s Journalism Corpus

Karel Havlíček’s Journalism Corpus (KH-NOVINY) contains all journalistic text written by Karel Havlíček (1821—1856) and published in his periodicals Pražské noviny (Prague Newspaper, 1846—1848), including its supplement Česká včela (The Czech Bee), and Národní noviny (National Newspaper, 1848—1850). The activities of Karel Havlíček, the founder of modern Czech journalism, document the history of substantial political and social changes in the era when the press became an exceptionally open platform for venting new political, economic and social opinions.

The corpus captures the articles in transliterated form, is neither lemmatized nor tagged and has 1 182 159 positions.

Structural units and metadata used in this corpus are summed up in the following table:

structure attribut description
<doc.r> date year of publication
<doc.m> date month of publication
<doc.d> date day of publication
<doc.t> title newspaper title
<doc.c> section newspaper section
<doc.h> title name of article
<doc.a> author author
<e> amendment change to the copy; damaged, unreadable, missing text
<f> formatted text graphically arranged text, tabels, cation below a picture etc.
<g> quotation text written by someone else and quoted by Havlíček
<k> chapter title, subchapter title, article title etc.
<n> footnote
<o> foreign-language text
<s> page number
<v> verse text
<z> non-standard phenomena phenomena such as spelling homonyms, misprint, word boundary
<zav> comment editor’s note

The corpus originated in 2017–2021 within the project 17-13671S (Karel Havlíček’s Journalism and Correspondence) of the Czech Science Foundation (GAČR).

How to cite Corpus KH-NOVINY

Genserová, B. – Hledíková, H. – Řehořková, A. – Stluka, M. et al.: KH-noviny: korpus publicistiky Karla Havlíčka. Ústav Českého národního korpusu FF UK, Praha 2021. Dostupný z WWW: <http://www.korpus.cz>