AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
en:cnk:ksp [2022/08/15 13:04] – created michalskrabalen:cnk:ksp [2023/05/18 16:22] (current) michalkren
Line 1: Line 1:
 ====== Corpus of Contemporary Czech Poetry (C3P) ======  ====== Corpus of Contemporary Czech Poetry (C3P) ====== 
-FIXME 
  
-As the name suggests, this is a corpus of contemporary poetry texts of Czech provenance (defined by the years 1990 and 2020), i.e. a representative sample of domestic poetry over the last three decades. Significantly, this sample includes not only texts officially published in books, and thus having gone through the classical editorial process, but also amateur works, concentrated mainly on so-called literary servers. This methodological decision is not due to a desire to democratise poetry; we believe that without web texts the picture of contemporary Czech poetry would not be complete, it would only cover one segment of poetry, which is rather small in proportion. This would not correspond to the reality that literary servers have played a significant role in the Czech literary context((PIORECKÝ, Karel. Česká literatura a nová média. Praha: Academia, 2016.)), among other things as a platform for the publishing beginnings of some now established poets. This basic dichotomy, by the way, opens up the possibility of confronting and comparing the two modes, distinguished in KSP by the doc.medium attribute (print vs web).+C3P is a joint project of the [[https://service.ucl.cas.cz/en/|Institute of Czech Literature of CAS]] and the Institute of the Czech National Corpus, dating back to 2015. As the name suggests, it is a corpus of contemporary Czech poetry texts (delimited by the years 1990 and 2020), i.e. a representative sample of Czech poetry over the last three decades. Significantly, this sample includes not only texts officially published in poetry books, and thus having gone through the standardeditorial process, but also amateur works, concentrated mainly on so-called literary forums. This methodological decision is not due to a desire to democratise poetry; we believe that without texts from the Internet, the picture of contemporary Czech poetry would not be complete, covering only one segment of poetry, which is relatively small in proportion. This would not correspond to the reality that literary forums have played a significant role in the Czech literary context((PIORECKÝ, Karel. Česká literatura a nová média. Praha: Academia, 2016.)), among other things as a platform for the publishing beginnings of some now established poets. This basic dichotomy, by the way, opens up the possibility of confronting and comparing the two modes, distinguished in C3P by the ''doc.medium'' attribute (print vs web).
  
 <WRAP right 35%> <WRAP right 35%>
 ^ <fs medium>Name</fs> ^^ <fs medium>FicTree</fs> ^ ^ <fs medium>Name</fs> ^^ <fs medium>FicTree</fs> ^
-^ ::: ^ Number of tokens |  42 435 867 |  +^ ::: ^ Number of tokens | 42 435 867 |  
 ^ ::: ^ Number of tokens (excl. punctuation) | 35 506 057 |   ^ ::: ^ Number of tokens (excl. punctuation) | 35 506 057 |  
 ^ ::: ^ Number of word forms | 900 203 |   ^ ::: ^ Number of word forms | 900 203 |  
Line 15: Line 14:
 ===== The composition of C3P ===== ===== The composition of C3P =====
  
-C3P currently contains approximately 35.5 million words. The print poetry subcorpus contributes about 1.7 million words to this number, coming from 21,478 poems printed in 496 poetry collections by 209 authors. The web component of the corpus (web) contains more than 442,000 poems from six literary servers (liter.cz, pismak.cz, totem.cz, libres.cz, psanci.cz, xxvi.cz), comprising over 34 million words. The texts in the print subcorpus were selected with regard to the generational layers of the contemporary poetry scene; currently, authors of Generations X and Y (i.e. born after 1965) are represented herewe continue to expand the corpus towards older generational layers.+C3P currently contains approximately 35.5 million running words. The print poetry subcorpus includes about 1.7 million words, coming from 21,478 poems printed in 496 poetry collections by 209 authors. The web component of the corpus contains more than 442,000 poems from six literary forums (liter.cz, pismak.cz, totem.cz, libres.cz, psanci.cz, xxvi.cz), comprising over 34 million words. The texts in the print subcorpus were selected with regard to the generational layers of the contemporary poetry scene. Currently, authors of Generations X and Y (i.e. born after 1965) are represented here, as we continue to expand the corpus towards older generations.
  
-For details on building the KSP, see the studies below.+For details on building C3P, see the studies below.
  
 ===== The annotation of C3P ===== ===== The annotation of C3P =====
  
-The [[http://versologie.cz/v2/web_content/tagset.php?lang=cz|tagset]] was adopted with minimal modifications from an earlier project, the Corpus of Czech Verse; in addition, the KSP was tagged with standard CNK annotation tools. +The [[http://versologie.cz/v2/web_content/tagset.php?lang=cz|tagset]] was adopted with minimal modifications from an earlier project, the [[https://versologie.cz/v2/web_content/corpus.php?lang=en|Corpus of Czech Verse]]. In addition, C3P was tagged with standard CNC annotation tools.
-jsme s minimálními úpravami přejali ze staršího projektu [[https://versologie.cz/v2/web_content/corpus.php|Korpus českého verše]]; mimoto byl KSP otagován [[cnk:syn2020:automaticka_anotace|standardními anotačními nástroji]] ČNK.+
  
 ===== How to use C3P ===== ===== How to use C3P =====
  
-The KSP data can be used in various ways. In addition to standard concordance work in the [[https://www.korpus.cz/kontext/query?corpname=KSP|KonText interface]], other tools can be used:+C3P data can be investigated in various ways. In addition to standard concordance work in the [[https://www.korpus.cz/kontext/query?corpname=KSP|KonText interface]], other tools can be used:
  
-  * [[https://trost.korpus.cz/slovo-v-poezii/|Word in Poetry]]: a tool suitable for first introduction to the corpus, after entering a search word it offers previews to other applications and a range of statistical data. +  * [[https://trost.korpus.cz/slovo-v-poezii/|Word in Poetry]]: a tool suitable for the first introduction to the corpus. After entering a sought-after wordit offers previews of other applications and a range of statistical data. 
-  * [[https://versologie.cz/ksp/tool_hex/index.php?lang=cz|Hex]]: an application allowing to search for key words, i.e. those whose frequency is significantly higher in a given poem than in the whole corpus (thus it is particularly useful for thematic analyses) +  * [[https://versologie.cz/ksp/tool_hex/index.php?lang=cz|Hex]]: an application allowing to search for keywords, i.e. those whose frequency is significantly higher in a given poem than in the whole corpus (thusit is particularly useful for thematic analyses). 
-  * [[https://versologie.cz/ksp/tool_gunstick/index.php?lang=cz|Gunstick]]: a tool used to search for rhyme pairs and provide statistics on their frequency+  * [[https://versologie.cz/ksp/tool_gunstick/index.php?lang=cz|Gunstick]]: a tool used to search for rhyme pairs and providing statistics on their frequency.
  
-All of the above tools allow working with the whole corpusor separately with its parts (web - poetry from internet literary forums; print - poetry published in books). More tools for working with the KSP will be added gradually.+All the above tools allow working with the whole corpus or separately with its parts (either poetry from internet literary forums or published in books). More tools for working with C3P will be added gradually.
  
-===== Poděkování =====+===== Acknowledgment =====
  
-We would like to thank our colleagues who have repeatedly contributed valuable advice and selfless help to the successful completion of C3P: Michal Křen, Václav Cvrček, Petr Plecháč and Robert Kolář. Our annotators from among the students of the Faculty of Arts of Charles University deserve no fewer thanks, namely:+We would like to thank our colleagues who have repeatedly contributed valuable advice and selfless help to the successful completion of C3P: Michal Křen, Václav Cvrček, Petr Plecháč and Robert Kolár. Our annotators from among the students of the Faculty of Arts of Charles University deserve no fewer thanks, namely:
 Šárka Kadavá, Jan Musil, Ondřej Pavlík, Milan Pavlovič, Martin Šplíchal, Lukáš Tomášek and Štěpán Truhlařík. Šárka Kadavá, Jan Musil, Ondřej Pavlík, Milan Pavlovič, Martin Šplíchal, Lukáš Tomášek and Štěpán Truhlařík.