AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:ksp [2023/05/18 16:22] michalkrenen:cnk:ksp [2024/11/05 10:31] (current) – [The composition of C3P] michalskrabal
Line 4: Line 4:
  
 <WRAP right 35%> <WRAP right 35%>
-^ <fs medium>Name</fs> ^^ <fs medium>FicTree</fs> ^ +^ <fs medium>Název</fs> <fs medium>KSP • v1</fs> | <fs medium>KSP • v2</fs> | 
-^ ::: ^ Number of tokens | 42 435 867 |   +^ Number of tokens | 42 435 867 | 43 224 671 |  
-^ ::: ^ Number of tokens (excl. punctuation) | 35 506 057 |   +^ Number of tokens (excl. punctuation) | 35 506 057 | 37 508 079 |  
-^ ::: ^ Number of word forms | 900 203 |   +^ Number of word forms | 900 203 | 881 306 | 
-^ ::: ^ Number of lemmas | 378 233 | +^ Number of lemmas | 378 233 | 339 907 
-^ ::: ^ Publication date | 2022  |+^ Publication date | 2022  | 2024 |
 </WRAP> </WRAP>
  
 ===== The composition of C3P ===== ===== The composition of C3P =====
  
-C3P currently contains approximately 35.5 million running words. The print poetry subcorpus includes about 1.7 million words, coming from 21,478 poems printed in 496 poetry collections by 209 authors. The web component of the corpus contains more than 442,000 poems from six literary forums (liter.cz, pismak.cz, totem.cz, libres.cz, psanci.cz, xxvi.cz), comprising over 34 million words. The texts in the print subcorpus were selected with regard to the generational layers of the contemporary poetry scene. Currently, authors of Generations X and Y (i.e. born after 1965) are represented here, as we continue to expand the corpus towards older generations.+C3P currently contains approximately 37.5 million running words. The print poetry subcorpus includes about 2.7 million words, coming from 27,675 poems printed in 682 poetry collections by 256 authors. The web component of the corpus contains more than 280,000 poems from six literary forums (liter.cz, pismak.cz, totem.cz, libres.cz, psanci.cz, xxvi.cz), comprising over 34 million words. The texts in the print subcorpus were selected with regard to the generational layers of the contemporary poetry scene. Currently, authors of Generations X and Y and baby boomers (i.e. all those born after 1945) are represented here, as we continue to expand the corpus towards older generations.
  
 For details on building C3P, see the studies below. For details on building C3P, see the studies below.
Line 35: Line 35:
  
 We would like to thank our colleagues who have repeatedly contributed valuable advice and selfless help to the successful completion of C3P: Michal Křen, Václav Cvrček, Petr Plecháč and Robert Kolár. Our annotators from among the students of the Faculty of Arts of Charles University deserve no fewer thanks, namely: We would like to thank our colleagues who have repeatedly contributed valuable advice and selfless help to the successful completion of C3P: Michal Křen, Václav Cvrček, Petr Plecháč and Robert Kolár. Our annotators from among the students of the Faculty of Arts of Charles University deserve no fewer thanks, namely:
-Šárka Kadavá, Jan Musil, Ondřej Pavlík, Milan Pavlovič, Martin Šplíchal, Lukáš Tomášek and Štěpán Truhlařík.+Šárka Kadavá, Tereza Marková, Barbora Mitrengová, Barbora Mlchová, Jan Musil, Ondřej Pavlík, Milan Pavlovič, Martin Šplíchal, Lukáš Tomášek Štěpán Truhlařík. 
  
 C3P is supported by //Premium Academiae// funding awarded by the Czech Academy of Sciences to Prof. Pavel Janoušek. Thank you! C3P is supported by //Premium Academiae// funding awarded by the Czech Academy of Sciences to Prof. Pavel Janoušek. Thank you!
Line 41: Line 41:
 ===== How to cite C3P ===== ===== How to cite C3P =====
 <WRAP round tip 70%> <WRAP round tip 70%>
-Škrabal, M. – Piorecký, K. – ProcházkaP. – Jeziorský, T.: Korpus současné poezieverze 1.0 z 29. 62022. Ústav Českého národního korpusu FF UK – Ústav pro českou literaturu AV ČR, v. v. i., Praha 2022Available from WWWhttp://www.korpus.cz+Piorecký, K. – ŠkrabalM. – Jeziorský, T.: The Corpus of Contemporary Czech Poetryversion 2 from 1392024. Ústav Českého národního korpusu FF UK – Ústav pro českou literaturu AV ČR, v. v. i., Praha 2024Dostupný z WWW http://www.korpus.cz
  
-Piorecký, K. – ŠkrabalM.: Vícejazyčnost současné české poeziiNěkolik úvodních postřehů z korpusové perspektivySlovenská literatura 6/2020s568--583.+Škrabal, M. – Piorecký, K. – ProcházkaP. – Jeziorský, T.: The Corpus of Contemporary Czech Poetry, version 1 from 29. 6. 2022. Ústav Českého národního korpusu FF UK – Ústav pro českou literaturu AV ČR, v. v. i., Praha 2022. Available from WWW: http://www.korpus.cz
  
-Škrabal, M. – Piorecký, K.: The Corpus of Contemporary Czech Poetry: A database for research on contemporary poetic language across media. Digital Scholarship in the Humanities XX/2022, FIXME s1--14. https://doi.org/10.1093/llc/fqac013 +Piorecký, K. – Škrabal, M.: Vícejazyčnost v současné české poezii. Několik úvodních postřehů z korpusové perspektivy. Slovenská literatura 6/2020, p. 568--583. 
 + 
 +Škrabal, M. – Piorecký, K.: The Corpus of Contemporary Czech Poetry: A database for research on contemporary poetic language across media. Digital Scholarship in the Humanities 4/2022, p1240--1253. https://doi.org/10.1093/llc/fqac013 
 </WRAP> </WRAP>