Karel Havlíček’s Journalism Corpus
Karel Havlíček’s Journalism Corpus (KH-NOVINY) contains all journalistic text written by Karel Havlíček (1821—1856) and published in his periodicals Pražské noviny (Prague Newspaper, 1846—1848), including its supplement Česká včela (The Czech Bee), and Národní noviny (National Newspaper, 1848—1850). The activities of Karel Havlíček, the founder of modern Czech journalism, document the history of substantial political and social changes in the era when the press became an exceptionally open platform for venting new political, economic and social opinions.
The corpus captures the articles in transliterated form, is neither lemmatized nor tagged and has 1 182 159 positions.
Structural units and metadata used in this corpus are summed up in the following table:
structure | attribut | description |
---|---|---|
<doc.r> | date | year of publication |
<doc.m> | date | month of publication |
<doc.d> | date | day of publication |
<doc.t> | title | newspaper title |
<doc.c> | section | newspaper section |
<doc.h> | title | name of article |
<doc.a> | author | author |
<e> | amendment | change to the copy; damaged, unreadable, missing text |
<f> | formatted text | graphically arranged text, tabels, cation below a picture etc. |
<g> | quotation | text written by someone else and quoted by Havlíček |
<k> | chapter title, subchapter title, article title etc. | |
<n> | footnote | |
<o> | foreign-language text | |
<s> | page number | |
<v> | verse text | |
<z> | non-standard phenomena | phenomena such as spelling homonyms, misprint, word boundary |
<zav> | comment | editor’s note |
The corpus originated in 2017–2021 within the project 17-13671S (Karel Havlíček’s Journalism and Correspondence) of the Czech Science Foundation (GAČR).
How to cite Corpus KH-NOVINY
Genserová, B. – Hledíková, H. – Řehořková, A. – Stluka, M. et al.: KH-noviny: korpus publicistiky Karla Havlíčka. Ústav Českého národního korpusu FF UK, Praha 2021. Dostupný z WWW: <http://www.korpus.cz>