AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
en:cnk:czesl-sgt-basic [2019/10/31 19:22] alexandrrosenen:cnk:czesl-sgt-basic [2019/10/31 19:40] (current) alexandrrosen
Line 1: Line 1:
 ====== CzeSL-SGT-basic – a corpus of non-native Czech with simplified search options ====== ====== CzeSL-SGT-basic – a corpus of non-native Czech with simplified search options ======
  
-The CzeSL-SGT-basic corpus is based on the CzeSL-SGT corpus (//**Cze**ch as a **S**econd **L**anguage with **S**pelling, **G**rammar and **T**ags//), which includes transcriptions of essays written by non-native speakers of Czech, extending the “foreign” (ciz) part of the [[cnk:CzeSL-plain]] corpus by texts collected in 2013. The difference is in options available in the search interface: CzeSL-SGT-basic offeres a reduced set of metadata items.+The CzeSL-SGT-basic corpus is based on the CzeSL-SGT corpus (//**Cze**ch as a **S**econd **L**anguage with **S**pelling, **G**rammar and **T**ags//), which includes transcriptions of essays written by non-native speakers of Czech, extending the “foreign” (ciz) part of the [[cnk:CzeSL-plain]] corpus by texts collected in 2013. The difference is in options available in the search interface: CzeSL-SGT-basic offers a reduced set of metadata items.
  
 Word forms are tagged by word class, morphological categories and base forms (lemmas). Some forms are corrected and the resulting texts are tagged again. Original and corrected forms are compared and error labels are assigned. The annotation is assigned automatically, which necessarily results in some inaccuracy and error rate. Word forms are tagged by word class, morphological categories and base forms (lemmas). Some forms are corrected and the resulting texts are tagged again. Original and corrected forms are compared and error labels are assigned. The annotation is assigned automatically, which necessarily results in some inaccuracy and error rate.