Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:cnk:lemtag_mluv [2017/07/07 11:36] – [Related links] veronikapojarova | en:cnk:lemtag_mluv [2017/07/18 15:12] (current) – [Lemmatization and tagging in spoken corpora] michalkren |
---|
**Tagging method** | **Tagging method** |
| |
[[en:seznamy:tagy#pozice_1_-_slovni_druh|The morphological tagging system]] is the same as for written corpora, however, some tags for associated categories are retained (e.g. X for any gender, Y for masculine animate or inanimate etc.) just as they are contained in the morphological dictionary MorfFlex CZ (Hajič–Hlaváčová, 2013). This dictionary was manually and semiautomatically supplemented by frequently unrecognised forms (e.g. dialectal suffixes, forms with varying quantity, prothetic v). The stochastic tagging system MorphoDiTa (Straka a kol., 2014) was used for the tagging itself. | [[seznamy:tagy#pozice_1_-_slovni_druh|The morphological tagging system]] (the description is in Czech only) is the same as for written corpora, however, some tags for associated categories are retained (e.g. X for any gender, Y for masculine animate or inanimate etc.) just as they are contained in the morphological dictionary MorfFlex CZ (Hajič–Hlaváčová, 2013). This dictionary was manually and semiautomatically supplemented by frequently unrecognised forms (e.g. dialectal suffixes, forms with varying quantity, prothetic v). The stochastic tagging system MorphoDiTa (Straka a kol., 2014) was used for the tagging itself. |
| |
===== Modifications to the morphological dictionary ===== | ===== Modifications to the morphological dictionary ===== |
===== Tag forms===== | ===== Tag forms===== |
| |
The form of the tags corresponds to that of the [[en:seznamy:tagy#pozice_1_-_slovni_druh|morphological tags]] used in the [[en:cnk:syn|SYN]] series written corpora before the simplification of the tagging system and does not include aspect in the 16th position. | The form of the tags corresponds to that of the [[seznamy:tagy#pozice_1_-_slovni_druh|morphological tags]] (Czech only) used in the [[en:cnk:syn|SYN]] series written corpora before the simplification of the tagging system and does not include aspect in the 16th position. |
Apart from these tags, the first position for the word class and the POS attribute can have the following values: | Apart from these tags, the first position for the word class and the POS attribute can have the following values: |
| |
| |
===== Acknowledgements ===== | ===== Acknowledgements ===== |
We would like to thank doc. Klára Osolsobě and Mgr. Dana Hlaváčková, Ph.D. for providing valuable consultations. | We would like to thank doc. Klára Osolsobě and Dr. Dana Hlaváčková for providing valuable consultations. |
| |
===== Sources ===== | ===== Sources ===== |
| |
<WRAP round box 72%> | <WRAP round box 72%> |
[[en:cnk:oral|ORAL]] • [[en:cnk:ortofon|ORTOFON]] • [[en:cnk:dialekt|DIALEKT]] • [[en:pojmy:mluveny|Spoken language corpus]] • [[en:pojmy:atributy_strukturni#strukturni_atributy_korpusu_rady_oral|Structure of the ORAL corpora]] • [[en:kurz:hledani_v_mluvenych_korpusech|Searching in spoken corpora]] • [[en:kurz:hledani_ORTOFON|Searching in the ORTOFON corpus]] • [[en:cnk:dialekt:prace|Searching in the DIALEKT corpus]] | [[en:cnk:oral|ORAL]] • [[en:cnk:ortofon|ORTOFON]] • [[en:cnk:dialekt|DIALEKT]] |
</WRAP> | </WRAP> |