Differences

This shows you the differences between two versions of the page.

--- en:cnk:ortofon [2025/01/17 22:57] – [ORTOFON v3 (2024)] michalkren
+++ en:cnk:ortofon [2026/06/30 12:38] (current) – michalkren
@@ Line 1: / Line 1: @@
 ====== Corpus of informal spoken Czech with multi-tier transcription: ORTOFON ======
-The ORTOFON corpus captures spontaneous spoken language used in informal situations between speakers who know each other. It follows the [[en:cnk:oral|ORAL]] series of informal spoken Czech corpora in its data collection design. The recordings are transcribed in two tiers - orthographic and phonetic. Together with the [[en:cnk:dialekt|DIALEKT]] corpus, these are the first two spoken Czech corpora to have multi-tier transcription. Similar to the [[en:cnk:oral2013|ORAL2013]] corpus, speakers come from all over the Czech Republic and selected sociological information is collected about them. The corpus is lemmatized and morphologically tagged. The transcription is linked to the audio track and the audio can be played back in the KonText corpus interface.
+The ORTOFON corpus captures spontaneous spoken language used in informal situations between speakers who know each other. It follows the [[en:cnk:oral|ORAL]] series of informal spoken Czech corpora in its data collection design. The recordings are transcribed in two tiers - orthographic and phonetic, using the [[https://archive.mpi.nl/tla/elan|ELAN]] tool, developed in the Max Planck Institute for Psycholinguistics, Nijmegen((ELAN (Version 7.1) [Computer software]. (2026). Nijmegen: Max Planck Institute for Psycholinguistics. Retrieved from https://archive.mpi.nl/tla/elan
+)). Together with the [[en:cnk:dialekt|DIALEKT]] corpus, these are the first two spoken Czech corpora to have multi-tier transcription. Similar to the [[en:cnk:oral2013|ORAL2013]] corpus, speakers come from all over the Czech Republic and selected sociological information is collected about them. The corpus is lemmatized and morphologically tagged. The transcription is linked to the audio track and the audio can be played back in the KonText corpus interface.
 The ORTOFON corpus allows us to explore various aspects of spoken language, i.e. lexis, morphology, syntax, pragmatics, dialogue construction. The corpus is not primarily intended for dialectological ((The [[en:cnk:dialekt|DIALEKT]] corpus is intended for this kind of research.)) or phonetic research, even though a simplified phonetic transcription allows us to verify the existence of pronunciation or regional variants, or phenomena related to pronunciation.
@@ Line 34: / Line 35: @@
 ===== Morphological tagging of the ORTOFON corpus =====
-The ORTOFON v3 corpus is automatically [[en:pojmy:tag|annotated]] with [[en:cnk:syn2020#morphological_tagging|a new morphological tag]] according to the SYN2020 standard. It recognizes [[en:cnk:syn2020#multiple_lemmatization_and_tagging_aggregate|aggregates]] (e.g., //vidělas//, //zač//), uses [[en:cnk:syn2020|double-level lemmatization]], and has a verb tag ([[en:cnk:syn2020#verb_tagging_verbtag|verbtag]]).
+The ORTOFON v3 corpus is automatically [[en:pojmy:tag|annotated]] with [[en:cnk:syn2020#morphological_tagging|a new morphological tag]] according to the [[en:cnk:anotacni_standard_cnk|unified CNC annotation scheme]]. It recognizes [[en:cnk:syn2020#multiple_lemmatization_and_tagging_aggregate|aggregates]] (e.g., //vidělas//, //zač//), uses [[en:cnk:syn2020|double-level lemmatization]], and has a verb tag ([[en:cnk:syn2020#verb_tagging_verbtag|verbtag]]).
 Substandard variants and forms typical of dialects and spontaneous speech are also tagged in the corpus. Special variants of words are distinguished by their own sublemma (e.g. //poslúchat// under the lemma //poslouchat//), special forms tagged only in the spoken corpus have the number 9 in the last tag position (e.g. the form //jezdijó// has the tag  ''%%VB-P---3P-AAI-9%%'').
@@ Line 91: / Line 92: @@
 <WRAP round tip 70%>
+**Corpus as a language resource**
 Lukeš, D. – Kopřivová, M. – Laubeová, Z. – Poukarová, P. – Horký, V. – Jelínek, T. – Křivan, J. – Waclawičová, M. – Benešová, L. – Škarpová, M.:  //ORTOFON v3: Korpus neformální mluvené češtiny s víceúrovňovým přepisem//. Ústav Českého národního korpusu FF UK, Praha 2024. Retrieved from: http://www.korpus.cz
@@ Line 96: / Line 100: @@
 Kopřivová, M. – Komrsková, Z. – Lukeš, D. – Poukarová, P. – Škarpová, M.: //ORTOFON v1: Korpus neformální mluvené češtiny s víceúrovňovým přepisem//. Ústav Českého národního korpusu FF UK, Praha 2017. Retrieved from: http://www.korpus.cz
+**References**
 Komrsková, Z. – Kopřivová, M. – Lukeš, D. – Poukarová, P. – Goláňová, H. (2017): New Spoken Corpora of Czech: ORTOFON and DIALEKT. //Jazykovedný časopis//, 68(2), 219-228. ISSN 0021-8897.

Trace: • ortofon • lists • eebo • parlcorp

Differences

Search

Navigation

Print/export

Tools

Languages

Licence