AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

en:cnk:capek [2019/12/19 14:12] – created Michal Křenen:cnk:capek [2019/12/19 14:14] (current) – [Corpora of texts by Karel Čapek] Michal Křen
Line 2: Line 2:
 ====== Corpora of texts by Karel Čapek ====== ====== Corpora of texts by Karel Čapek ======
  
-'capek' a 'capek_uplny' are author corpora of texts written by [[https://en.wikipedia.org/wiki/Karel_%C4%8Capek|Karel Čapek]] that have been created as the data source of [[https://www.nln.cz/knihy/slovnik-karla-capka/|Dictionary of Karel Čapek]]. 'capek' corpus contains all texts that have been undoubtedly written by himself (i.e. with no co-authors and without possible influence of a partner or translation original), while 'capek_uplny' corpus is a full collection of all the texts that Karel Čapek participated on (e.g. including texts co-authored by his brother Josef). This division is retained to keep the correspondence to the dictionary, and it also means that 'capek' is a subset of 'capek_uplny'.+'capek' a 'capek_uplny' are author corpora of texts written by [[https://en.wikipedia.org/wiki/Karel_%C4%8Capek|Karel Čapek]] that have been created as the data source of [[https://www.nln.cz/knihy/slovnik-karla-capka/|Dictionary of Karel Čapek]]. 'capek' corpus contains all texts that have been undoubtedly written by himself (i.e. with no co-authors and without possible influence of a partner or translation original), while 'capek_uplny' corpus is a full collection of all the texts that Karel Čapek participated on (e.g. including texts co-authored by his brother Josef). The two corpora have been retained to keep the correspondence to the dictionary, and as a consequence, 'capek' is a subset of 'capek_uplny'.
  
-Both corpora were taken over from the CD that accompanied the dictionary, there were no changes neither to the metadata nor to the lemmatization and morphological tagging of the texts. This means that the annotation does not correspond to the contemporary standards of annotation of the CNC corpora, but on the other hand it made it possible to preserve the results of demanding manual lemmatization that has been carried out before the publication of the dictionary.+Both corpora were taken over from the CD that accompanied the dictionary, there were no changes neither to the metadata nor to the lemmatization and morphological tagging of the texts. This means that the annotation may not correspond to the contemporary standards of annotation of the CNC corpora, but on the other handit made it possible to preserve the results of demanding manual lemmatization that has been carried out before the publication of the dictionary.
  
 ===== How to cite ===== ===== How to cite =====