AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:uvod [2023/12/08 11:41] – [Corpora of the Czech National Corpus project] michalkrenen:cnk:uvod [2024/02/29 21:00] (current) michalkren
Line 7: Line 7:
 ^ corpus ^ size (word count) ^  lemmas  ^ morphological tags ^  released((For versioned corpora (e.g. [[en:cnk:syn|SYN]] or [[en:cnk:intercorp|InterCorp]]), the year when the first version was released is also stated.))  ^ characteristic features ^ ^ corpus ^ size (word count) ^  lemmas  ^ morphological tags ^  released((For versioned corpora (e.g. [[en:cnk:syn|SYN]] or [[en:cnk:intercorp|InterCorp]]), the year when the first version was released is also stated.))  ^ characteristic features ^
 | **General corpora** |||||| | **General corpora** ||||||
-| [[en:cnk:syn|SYN]] ([[en:cnk:syn:verze11|version 11]]) |  5G |  ✓  |  ✓  |  2010–2022  | versioned corpus, unification of all the SYN-series synchronic written corpora |+| [[en:cnk:syn|SYN]] ([[en:cnk:syn:verze12|version 12]]) |  5G |  ✓  |  ✓  |  2010–2023  | versioned corpus, unification of all the SYN-series synchronic written corpora |
 | [[en:cnk:syn2020|SYN2020]] |  100M |  ✓  |  ✓  |  2020  | reference representative corpus, most of the texts are from 2014--2019 | | [[en:cnk:syn2020|SYN2020]] |  100M |  ✓  |  ✓  |  2020  | reference representative corpus, most of the texts are from 2014--2019 |
 | [[en:cnk:syn2015|SYN2015]] |  100M |  ✓  |  ✓  |  2015  | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts | | [[en:cnk:syn2015|SYN2015]] |  100M |  ✓  |  ✓  |  2015  | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts |
Line 43: Line 43:
 | [[en:cnk:link|LINK]] |  1.8M |  ✓  |  ✓  |  2010  | non-reference corpus of linguistic texts | | [[en:cnk:link|LINK]] |  1.8M |  ✓  |  ✓  |  2010  | non-reference corpus of linguistic texts |
 | [[en:cnk:totalita|Totalita]] |  12,9M |  ✓  |  ✓  |  2010  | written language of the communist regime | | [[en:cnk:totalita|Totalita]] |  12,9M |  ✓  |  ✓  |  2010  | written language of the communist regime |
 +| [[en:cnk:veda|Věda]] |  15M |  ✓  |  ✓  |  2023  | corpus of scientific Czech, complement to the [[https://db.korpus.cz/search/acphrase|Phrase Bank of Academic Czech]] |
 ^ <fs large>Spoken synchronic corpora</fs> ^^^^^^ ^ <fs large>Spoken synchronic corpora</fs> ^^^^^^
 ^ corpus ^ size (word count) ^  lemmas  ^ morphological tags ^  year  ^ characteristic features ^ ^ corpus ^ size (word count) ^  lemmas  ^ morphological tags ^  year  ^ characteristic features ^