Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:cnk:orator [2025/06/06 13:36] – [Morphological tagging of the ORATOR corpus] martinawaclawicova | en:cnk:orator [2026/01/23 11:48] (current) – [Morphological tagging of the ORATOR corpus] krivan | ||
|---|---|---|---|
| Line 28: | Line 28: | ||
| ===== Morphological tagging of the ORATOR corpus ===== | ===== Morphological tagging of the ORATOR corpus ===== | ||
| - | The ORATOR v3 corpus is automatically [[en: | + | The ORATOR v3 corpus is automatically [[en: |
| - | Substandard variants and forms typical of dialects and spontaneous speech are also tagged in the corpus (according to the ORTOFON corpus). | + | Substandard variants and forms typical of dialects and spontaneous speech are also tagged in the corpus (according to the ORTOFON corpus, see [[en: |
| The following specific tags are used in the first tag position (word type): | The following specific tags are used in the first tag position (word type): | ||
| Line 54: | Line 54: | ||
| ====== ORATOR v3 (2025) ====== | ====== ORATOR v3 (2025) ====== | ||
| - | The ORATOR corpus in its third version contains the same recordings and transcripts as the second version (i.e. over 1.5 million tokens), but they are newly annotated according to the SYN2020 standard. The genphone attribute is also newly included in the corpus, indicating the automatically generated phonetic form of a word. In addition, several transcription corrections have been made. | + | The ORATOR corpus in its third version contains the same recordings and transcripts as the second version (i.e. over 1.5 million tokens) but annotated according to the new Unified CNC Annotation Scheme using a language model trained also on spoken data. The '' |
| ===== How to cite ===== | ===== How to cite ===== | ||