Skrýt
Nastavení

Welcome to the Czech National Corpus wiki


The Czech National Corpus (CNC) project was set up in 1994 in order to make accessible extensive linguistic data for teaching and research in the form of electronic corpora. The Czech National Corpus currently ensures access to more than three billion words contained in both synchronic and diachronic corpora, both spoken and written, parallel and monolinguistic (see overview). For working with them the CNC develops specialized tools.

The CNC project is managed mainly by the workers of two departments of Faculty of Arts, Charles University: The Institute of the Czech National Corpus and The Institute of Theoretical and Computational Linguistics. The collection of data and coordination of the individual activities is done with the aid of more than two hundred external workers from all over the Czech Republic.

What information will you find here?

This wiki serves CNC users not only as a source of information about the CNC (description of public corpora and their documentation, application manuals), but also as a continuously edited database of corpus linguistic knowledge. The main parts of the wiki consist of the following:

Manuals for CNC applications
Overview of corpora available within the CNC
Tutorial for working with the EEBO in 8 lessons
Index of basic concepts in corpus linguistics (in Czech)
List of sources and abbreviations (in Czech)

Frequently searched pages

Manuals for CNC applications

What is a corpus?

A language corpus is an extensive collection of authentic textual data (written or spoken) converted to electronic form in a uniform format, meaning that it can easily be searched for various linguistic phenomena – especially words and phrases (collocations). Corpora differ from a plain text archive or database primarily because they have been carefully compiled with the research purpose in mind (they should, for example, represent contemporary spoken language or written language or one of its parts, e.g. journalistic texts). A corpus displays linguistic phenomena in their natural context, which allows us to do language research based on actual data on a scale so large that it would have previously been unthinkable.

User support

The Helpdesk is available to all users, who are invited to post questions concerning work with the CNC (creating queries, corpus specifics etc.). The majority of the questions is answered within one work day.

The user support centre also includes error reports in CNC applications and sending improvement suggestions. The link to a form intended for such reports can be found at the very bottom of every application – “Report an error”.