AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:veda [2024/01/23 11:19] jankoceken:cnk:veda [2024/02/25 20:40] (current) – [How to cite Corpus of Academic Czech] michalkren
Line 1: Line 1:
 ====== Corpus of Academic Czech ====== ====== Corpus of Academic Czech ======
  
-The Academic Czech Corpus is a complement to [[https://korpus.cz/frazova-banka|Phrase Bank of Academic Czech]] and includes only Czech-language untranslated texts published after 2010 in scientific journals indexed in Web of ScienceScopus or EBSCO databases. Another criterion is the genre of the text: only studies and review articles are included in the corpus, not, for example, reviews or conference reports. In most cases, the texts are in the final editing stage, i.e. they have not undergone final editing or proofreading. The corpus contains articles from a total of 21 titles of Czech-language scientific journals and all six disciplines of the broader [[https://doi.org/10.1787/9789264239012-en|Frascat Manual]] are represented. A more precise composition of the corpus is given in the table below. The predominance of social sciences and humanities is due to the fact that relatively few Czech-language scientific articles are published in other disciplines.+The Corpus of Academic Czechs is a complement to [[https://korpus.cz/frazova-banka|Phrase Bank of Academic Czech]] and includes only Czech-language untranslated texts published after 2010 in scientific journals indexed in the Web of Science or Scopusor, in some cases, EBSCO databases. Another criterion is the genre of the text: only studies and review articles are included in the corpus, not, for example, reviews or conference reports. In most cases, the texts are in the final editing stage, i.e. they have not undergone final editing or proofreading. The corpus contains articles from a total of 21 titles of Czech-language scientific journals and all six disciplines of the broader [[https://doi.org/10.1787/9789264239012-en|Frascati Manual]] are represented. A more precise composition of the corpus is given in the table below. The predominance of social sciences and humanities is due to the fact that relatively few Czech-language scientific articles are published in other disciplines.
  
-Obor Titul Počet slov +Field Title Word count 
-| 1 Natural sciences | |  **1 951 029** |+| 1Natural sciences | |  **1 951 029** |
 | | Geografie |  733 885 | | | Geografie |  733 885 |
 | | Chemické listy |  1 217 144 | | | Chemické listy |  1 217 144 |
-| 2 Engineering and technology | |  **534 739** |+| 2Engineering and technology | |  **534 739** |
 | | Paliva |  534 739 | | | Paliva |  534 739 |
-| 3 Medical and health sciences | |  **1 811 902** |+| 3Medical and health sciences | |  **1 811 902** |
 | | Cor et Vasa |  643 254 | | | Cor et Vasa |  643 254 |
 | | Česká a slovenská neurologie a neurochirurgie |  1 168 648 | | | Česká a slovenská neurologie a neurochirurgie |  1 168 648 |
-| 4 Agricultural and veterinary sciences | |  **406 257** |+| 4Agricultural and veterinary sciences | |  **406 257** |
 | | Zprávy lesnického výzkumu |  406 257 | | | Zprávy lesnického výzkumu |  406 257 |
-| 5  Social sciences | |  **5 120 839** |+| 5 Social sciences | |  **5 120 839** |
 | | Československá psychologie |  856 683 | | | Československá psychologie |  856 683 |
 | | Český lid |  778 212 | | | Český lid |  778 212 |
Line 23: Line 23:
 | | Studia paedagogica |  673 108 | | | Studia paedagogica |  673 108 |
 | | Vojenské rozhledy |  205 899 | | | Vojenské rozhledy |  205 899 |
-| 6 Humanities and the arts | |  **5 434 650** |+| 6Humanities and the arts | |  **5 434 650** |
 | | Archeologické rozhledy |  1 289 072 | | | Archeologické rozhledy |  1 289 072 |
 | | Cornova |  304 773 | | | Cornova |  304 773 |
Line 31: Line 31:
 | | Slovo a slovesnost |  760 468 | | | Slovo a slovesnost |  760 468 |
 | | Studia theologica |  768 761 | | | Studia theologica |  768 761 |
-CELKEM ^ ^ 15 259 416 ^+Total ^ ^ 15 259 416 ^
  
 +The total extent of the corpus is more than 15 million words (almost 20 million [[terms:token|tokens]]) in 3,394 scientific articles. The technical processing of the corpus is based on the corpora of the [[SYN|SYN]] series. The main difference with the SYN series is that the documents here correspond to individual articles, not numbers. In addition, documents (articles) are further divided into individual sections (//<div>//) corresponding to text sections with an explicit class designation, which takes on the values //introduction//, //discussion//, //conclusion// and //unknown//. This breakdown was obtained by heuristic procedures and is therefore not always reliable. Metadata (authors, article title, number, year of publication, etc.) is available for all documents, which has undergone extensive manual revision. The lemmatization and morphological tagging of the corpus correspond to [[SYN2020|SYN2020]].
  
 +The author's team would like to thank the editors of the journals included in the corpus, without whose support the Corpus of Academic Czech could not have been created.
 +
 +
 +====== How to cite Corpus of Academic Czech ======
 +<WRAP round tip 70%>
 +Vondřička, P. – Kaderka, P. – Hoffmannová, J. – Homoláč, J. – Kocek, J. – Kopecký, J. – Křen, M. – Sherman, T.: //Korpus akademické češtiny, verze 1 z 20. 11. 2023//. Praha: Ústav Českého národního korpusu FF UK – Ústav pro jazyk český AV ČR, Praha 2023. Dostupný z WWW: http://www.korpus.cz
 +
 +Homoláč, J. – Křen, M. – Kašpárková, A. – Etchegoyen Rosolová, K. – Hoffmannová, J. – Kaderka, P. – Kopecký, J. – Sherman, T. – Vondřička, P.: Akademické psaní a frázové banky. //Slovo a slovesnost// 84(4), 2023, 303-321. https://doi.org/10.58756/s4348418.
 +</WRAP>