CzeSL-man – a corpus of non-native Czech with manual error annotation in a simplified tiered scheme

CzeSL-man is the name used in the search interface KonText for CzeSL-man v1 searchable, a corpus including annotated texts of non-native speakers of Czech. It is part of the texts from the CzeSL-SGT corpus. The corpus in the format of the feat annotation editor can be downloaded under the name CzeSL-man v1 downloadable from here.

The manual error annotation used in CzeSL-man v1 searchable is a simplified version of a two-stage annotation scheme created for the CzeSL project. A consequence of the simplification is the reversal of the source text and its annotation. The base text is a corrected version of the original text. The words of the corrected version are therefore tokens of this corpus. The original text is available in the annotation. Not all words from the original text are retained and their order may be affected by the word order of the correction.

The annotation also contains types of errors and – for the corrected text – morphosyntactic categories, lemmas, dependency syntactic structure and functions. The texts are also equipped with metadata about the author and the text.

For more information on the CzeSL learner corpus project, including an overview of all versions of the CzeSL learner corpus with links to search or download options, see CzeSL – a Learner Corpus of Czech and Rosen et al. (2020).

Citing CzeSL-SGT

Bedřichová, Z. – Hana, J. – Hrdlička, M. – Hrdličková, T. – Janeš, P. – Jelínek, T. – Lundáková, K. – Petkevič, V. – Pierscieniak, P. – Poláčková, M. – Rosen, A. – Skoumalová, H. – Sládek, Š. – Šebesta, K. – Škodová, S. – Šormová, K. – Štindlová, B. – Toufarová, D.: CzeSL-man – a corpus of non-native Czech with manual error annotation in a simplified tiered scheme, version v1 searchable, dated 18 November 2020. Institute of the Czech National Corpus, Prague 2020. Available on-line: