This shows you the differences between two versions of the page.

Link to this comparison view

en:cnk:jerome [2015/10/22 22:41] (current)
Václav Horký created; "(see Table 1)" deleted since there is no Table 1 nearby
Line 1: Line 1:
 +====== Corpus Jerome ======
 +Corpus JEROME is a monolingual comparable corpus specifically designed for analyzing translated Czech. It comprises more than 85 million tokens (including punctuation) and includes both fiction and professional literature. As a comparable corpus, it contains, in equal amounts, both translated and non-translated Czech (however, not original in the sense of source texts!). The non-translated part serves as a reference corpus.
 +Corpus JEROME is lemmatized and morphologically tagged in the same way as the [[en:​cnk:​SYN]] corpora. However, its annotation includes additional information,​ potentially relevant for translation studies researchers:​ first edition (prvnivyd), sex of the author (autor_sex),​ sex of the translator (preklad_sex).
 +JEROME provides a unique source of data for translation studies scholars, linguists and basically anyone who is interested in how translated Czech looks like. It is well suited for quantitative analyses as well as small-scale qualitative case studies (e.g. a study of translations made by female translators).
 +Corpus JEROME also includes a subcorpus balanced according to source languages (almost equally long texts from each language). This subcorpus is inevitably smaller (5 mil. tokens), but is perfect for verifying the universality of findings, e.g. when analyzing features called translation universals.
 +===== Citing JEROME =====
 +<WRAP round tip 60%>
 +Chlumská, L.: //JEROME: jednojazyčný srovnatelný korpus pro výzkum překladové češtiny//​. Ústav Českého národního korpusu FF UK, Praha 2013. Available on-line: http://​
 +--- //Lucie Chlumská//