Differences

This shows you the differences between two versions of the page.

--- en:cnk:uvod [2019/12/20 11:48] – [Corpora of the Czech National Corpus project] michalkren
+++ en:cnk:uvod [2019/12/20 15:36] – [Corpora of the Czech National Corpus project] michalkren
@@ Line 25: / Line 25: @@
 | [[en:cnk:ksk-dopisy|KSK-DOPISY]] |  800k |  ✗  |  ✗  |  2006  | transcriptions of handwritten correspondence from 1990--2004|
 | [[en:cnk:link|LINK]] |  1.8M |  ✓  |  ✓  |  2010  | non-reference corpus of linguistic texts |
+| [[en:cnk:net|NET]] |  41M |  ✓  |  ✓  |  2019  | corpus of semi-official internet communication |
 | [[en:cnk:orwell|ORWELL]] |  80k |  ✓  |  ✓  |  2003  | Orwell's novel [[wp>Nineteen_Eighty-Four|1984]], manually annotated  |
 | [[en:cnk:skript2012|SKRIPT2012]] |  590k |  ✓  |  ✓  |  2013  | corpus of school essays |
@@ Line 30: / Line 31: @@
 ^ corpus ^ size (word count) ^  lemmas  ^ morphological tags ^  year  ^ characteristic features ^
 | **General corpora** ||||||
+| [[en:cnk:orator|ORATOR]] |  580k |  ✓  |  ✓  |  2019  | reference corpus of monologues with one-layer transcription |
 | [[en:cnk:ortofon|ORTOFON]] |  1M |  ✓  |  ✓  |  2017  | reference representative corpus of informal spoken Czech with two-layer transcription (covers Bohemia, Moravia and Silesia) |
 | [[en:cnk:oral|ORAL]] |  5,4M |  ✓  |  ✓  |  2017  | reference corpus of informal spoken Czech (covers Bohemia, Moravia and Silesia) |
@@ Line 48: / Line 50: @@
 ^ corpus ^ size (word count) ^  lemmas  ^ morphological tags ^  year  ^ characteristic features ^
 | **Parallel corpora** ||||||
-| [[en:cnk:intercorp|InterCorp]] ([[en:cnk:intercorp:verze12|version 12]]) |  1.7G |  (✓)  |  (✓)  |  2008  | versioned parallel corpus being compiled as a part of the InterCorp project |
+| [[en:cnk:intercorp|InterCorp]] ([[en:cnk:intercorp:verze12|version 12]]) |  1.7G |  (✓)  |  (✓)  |  2008  | versioned parallel corpus for 40 languages |
 | **Comparable corpora** ||||||
 | [[en:cnk:aranea|Aranea]] |  1G |  ✓  |  ✓  |  2014  | comparable web corpora for several languages (cs, de, en, es, fi, fr, hu, it, nl, pl, pt, ru, sk, zh) |

Trace:

Differences

Search

Navigation

Print/export

Tools

Languages

Licence