Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:cnk:uvod [2023/12/08 11:41] – [Corpora of the Czech National Corpus project] michalkren | en:cnk:uvod [2024/02/29 21:00] (current) – michalkren |
---|
^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ released((For versioned corpora (e.g. [[en:cnk:syn|SYN]] or [[en:cnk:intercorp|InterCorp]]), the year when the first version was released is also stated.)) ^ characteristic features ^ | ^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ released((For versioned corpora (e.g. [[en:cnk:syn|SYN]] or [[en:cnk:intercorp|InterCorp]]), the year when the first version was released is also stated.)) ^ characteristic features ^ |
| **General corpora** |||||| | | **General corpora** |||||| |
| [[en:cnk:syn|SYN]] ([[en:cnk:syn:verze11|version 11]]) | 5G | ✓ | ✓ | 2010–2022 | versioned corpus, unification of all the SYN-series synchronic written corpora | | | [[en:cnk:syn|SYN]] ([[en:cnk:syn:verze12|version 12]]) | 5G | ✓ | ✓ | 2010–2023 | versioned corpus, unification of all the SYN-series synchronic written corpora | |
| [[en:cnk:syn2020|SYN2020]] | 100M | ✓ | ✓ | 2020 | reference representative corpus, most of the texts are from 2014--2019 | | | [[en:cnk:syn2020|SYN2020]] | 100M | ✓ | ✓ | 2020 | reference representative corpus, most of the texts are from 2014--2019 | |
| [[en:cnk:syn2015|SYN2015]] | 100M | ✓ | ✓ | 2015 | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts | | | [[en:cnk:syn2015|SYN2015]] | 100M | ✓ | ✓ | 2015 | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts | |
| [[en:cnk:link|LINK]] | 1.8M | ✓ | ✓ | 2010 | non-reference corpus of linguistic texts | | | [[en:cnk:link|LINK]] | 1.8M | ✓ | ✓ | 2010 | non-reference corpus of linguistic texts | |
| [[en:cnk:totalita|Totalita]] | 12,9M | ✓ | ✓ | 2010 | written language of the communist regime | | | [[en:cnk:totalita|Totalita]] | 12,9M | ✓ | ✓ | 2010 | written language of the communist regime | |
| | [[en:cnk:veda|Věda]] | 15M | ✓ | ✓ | 2023 | corpus of scientific Czech, complement to the [[https://db.korpus.cz/search/acphrase|Phrase Bank of Academic Czech]] | |
^ <fs large>Spoken synchronic corpora</fs> ^^^^^^ | ^ <fs large>Spoken synchronic corpora</fs> ^^^^^^ |
^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ year ^ characteristic features ^ | ^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ year ^ characteristic features ^ |