Corpora of texts by Karel Čapek
'capek' a 'capek_uplny' are author corpora of texts written by Karel Čapek that have been created as the data source of Dictionary of Karel Čapek. 'capek' corpus contains all texts that have been undoubtedly written by himself (i.e. with no co-authors and without possible influence of a partner or translation original), while 'capek_uplny' corpus is a full collection of all the texts that Karel Čapek participated on (e.g. including texts co-authored by his brother Josef). The two corpora have been retained to keep the correspondence to the dictionary, and as a consequence, 'capek' is a subset of 'capek_uplny'.
Both corpora were taken over from the CD that accompanied the dictionary, there were no changes neither to the metadata nor to the lemmatization and morphological tagging of the texts. This means that the annotation may not correspond to the contemporary standards of annotation of the CNC corpora, but on the other hand, it made it possible to preserve the results of demanding manual lemmatization that has been carried out before the publication of the dictionary.
How to cite
Čermák, F. et al.: Capek: korpus pouze vlastních textů Karla Čapka. Ústav Českého národního korpusu FF UK, Praha 2007. Dostupný z WWW:
Čermák, F. et al.: Capek_uplny: korpus všech textů Karla Čapka. Ústav Českého národního korpusu FF UK, Praha 2007. Dostupný z WWW:
Čermák, F. (ed.): Slovník Karla Čapka. Nakladatelství Lidové noviny, Praha 2007.