Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:cnk:uvod [2024/11/08 12:14] – [Corpora of the Czech National Corpus project] michalkren | en:cnk:uvod [2025/03/17 16:56] (current) – [Corpora of the Czech National Corpus project] michalkren |
---|
^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ released((For versioned corpora (e.g. [[en:cnk:syn|SYN]] or [[en:cnk:intercorp|InterCorp]]), the year when the first version was released is also stated.)) ^ characteristic features ^ | ^ corpus ^ size (word count) ^ lemmas ^ morphological tags ^ released((For versioned corpora (e.g. [[en:cnk:syn|SYN]] or [[en:cnk:intercorp|InterCorp]]), the year when the first version was released is also stated.)) ^ characteristic features ^ |
| **General corpora** |||||| | | **General corpora** |||||| |
| [[en:cnk:syn|SYN]] ([[en:cnk:syn:verze12|version 12]]) | 5G | ✓ | ✓ | 2010–2023 | versioned corpus, unification of all the SYN-series synchronic written corpora | | | [[en:cnk:syn|SYN]] ([[en:cnk:syn:verze13|version 13]]) | 5.3G | ✓ | ✓ | 2010–2024 | versioned corpus, unification of all the SYN-series synchronic written corpora | |
| [[en:cnk:syn2020|SYN2020]] | 100M | ✓ | ✓ | 2020 | reference representative corpus, most of the texts are from 2014--2019 | | | [[en:cnk:syn2020|SYN2020]] | 100M | ✓ | ✓ | 2020 | reference representative corpus, most of the texts are from 2014--2019 | |
| [[en:cnk:syn2015|SYN2015]] | 100M | ✓ | ✓ | 2015 | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts | | | [[en:cnk:syn2015|SYN2015]] | 100M | ✓ | ✓ | 2015 | reference representative corpus, most of the texts are from 2010--2014, with new classification of texts | |
| [[en:cnk:kh-dopisy|KH-DOPISY]] | 500k | ✗ | ✗ | 2017 | corpus of Karel Havlíček's correspondence | | | [[en:cnk:kh-dopisy|KH-DOPISY]] | 500k | ✗ | ✗ | 2017 | corpus of Karel Havlíček's correspondence | |
| [[en:cnk:kh-noviny|KH-NOVINY]] | 1M | ✗ | ✗ | 2021 | corpus of Karel Havlíček's journalism | | | [[en:cnk:kh-noviny|KH-NOVINY]] | 1M | ✗ | ✗ | 2021 | corpus of Karel Havlíček's journalism | |
| | [[en:cnk:klaus|Klaus]] | 1.5M | ✓ | ✓ | 2024 | corpus of Václav Klaus' texts | |
| [[en:cnk:orwell|ORWELL]] | 80k | ✓ | ✓ | 2003 | Orwell's novel [[wp>Nineteen_Eighty-Four|1984]], manually annotated | | | [[en:cnk:orwell|ORWELL]] | 80k | ✓ | ✓ | 2003 | Orwell's novel [[wp>Nineteen_Eighty-Four|1984]], manually annotated | |
| **Specialized corpora** |||||| | | **Specialized corpora** |||||| |
| [[en:cnk:codit|CODIT]] | 27M | ✗ | ✗ | 2021 | diachronic corpus of Italian covering a period from the 13th century until 1947 | | | [[en:cnk:codit|CODIT]] | 27M | ✗ | ✗ | 2021 | diachronic corpus of Italian covering a period from the 13th century until 1947 | |
| [[en:cnk:dotko|DOTKO]] (version 2) | 15.5M | ✓ | ✗ | 2010 | non-reference corpus of Lower Sorbian | | | [[en:cnk:dotko|DOTKO]] (version 2) | 15.5M | ✓ | ✗ | 2010 | non-reference corpus of Lower Sorbian | |
| [[en:cnk:eebo|EEBO]] | 730M | ✗ | ✗ | 2015 | English texts from the period 1475--1700, [[http://www.textcreationpartnership.org/tcp-eebo/|Early English Books Online]] | | | [[en:cnk:eebo|EEBO]] (version 2) | 1.3G | ✓ | ✓ | 2015 | English texts from the period 1475--1700, [[https://textcreationpartnership.org/tcp-texts/eebo-tcp-early-english-books-online/|Early English Books Online]] | |
| [[en:cnk:hotko|HOTKO]] (version 2) | 36M | ✗ | ✗ | 2013 | non-reference corpus of Upper Sorbian | | | [[en:cnk:hotko|HOTKO]] (version 2) | 36M | ✗ | ✗ | 2013 | non-reference corpus of Upper Sorbian | |
| [[en:cnk:lEstRepublicain|lEstRepublicain]] | 73M | ✓ | ✓ | 2013 | corpus of French newspaper L'Est Républicain | | | [[en:cnk:lEstRepublicain|lEstRepublicain]] | 73M | ✓ | ✓ | 2013 | corpus of French newspaper L'Est Républicain | |
| [[en:cnk:nkjp|NKJP_1M]] | 1M | ✓ | ✓ | 2018 | manually annotated one-million subcorpus of the National Corpus of Polish | | | [[en:cnk:nkjp|NKJP_1M]] | 1M | ✓ | ✓ | 2018 | manually annotated one-million subcorpus of the National Corpus of Polish | |
| [[en:cnk:obc|OBC]] | 24M | ✗ | ✓ | 2021 | [[http://fedora.clarin-d.uni-saarland.de/oldbailey/index.html|Old Bailey Corpus]], trial proceedings from 1720--1913 | | | [[en:cnk:obc|OBC]] | 24M | ✗ | ✓ | 2021 | [[http://fedora.clarin-d.uni-saarland.de/oldbailey/index.html|Old Bailey Corpus]], trial proceedings from 1720--1913 | |