AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Last revisionBoth sides next revision
en:cnk:czesl-plain [2018/08/07 12:39] alexandrrosenen:cnk:czesl-plain [2018/08/07 12:47] – metadata missing alert alexandrrosen
Line 19: Line 19:
  
 The essays and handwritten school exams were collected as manuscripts, scanned and transcribed into an electronic form. Academic texts by non-native speakers were obtained from the authors already in an electronic form. While these texts were not written in a class or with the aim to be included in a corpus, their final form may have been affected by an automatic spellchecker. The essays and handwritten school exams were collected as manuscripts, scanned and transcribed into an electronic form. Academic texts by non-native speakers were obtained from the authors already in an electronic form. While these texts were not written in a class or with the aim to be included in a corpus, their final form may have been affected by an automatic spellchecker.
 +
 +
 +Texts of non-native speakers (the **ciz** part), extended by some newer texts, are available as the CzeSL-sgt corpus, together with metadata and automatically performed morphosyntactic and error annotation, including the identification of incorrect forms. The CzeSL-plain corpus is also available from the LINDAT-Clarin repository as AKCES 3 a AKCES4. See also CzeSL – a Learner Corpus of Czech.
  
 Although the CzeSL-plain corpus does not contain any linguistic annotation at the moment, its next release will include more texts (the corpus is thus non-reference) and provide automatic identification of incorrect forms and morphosyntactic tags. Some of the texts included in the CzeSL-plain corpus are annotated by correct forms, error labels, morphosyntactic tags and lemmas and are due for release under a different purpose-built search interface. Although the CzeSL-plain corpus does not contain any linguistic annotation at the moment, its next release will include more texts (the corpus is thus non-reference) and provide automatic identification of incorrect forms and morphosyntactic tags. Some of the texts included in the CzeSL-plain corpus are annotated by correct forms, error labels, morphosyntactic tags and lemmas and are due for release under a different purpose-built search interface.