Both sides previous revisionPrevious revision | |
en:cnk:citace [2023/10/11 20:12] – [Lemmatization and tagging] jankrivan | en:cnk:citace [2023/10/13 17:15] (current) – [Lemmatization and tagging] jankrivan |
---|
==== Lemmatization and tagging ==== | ==== Lemmatization and tagging ==== |
| |
If you use [[en:pojmy:lemma|lemmatization]] or [[en:pojmy:tag|morphological tags]] (attributes //lemma// or //tag// in the SYN series corpora), please also cite the following publications: | * If you use [[en:pojmy:lemma|lemmatization]], [[en:pojmy:tag|morphological]] or [[en:cnk:syn2020#verb_tagging_verbtag|verb]] tags (attributes //lemma//, //tag// or //verbtag// in the SYN series corpora), please also cite one of the following publications: |
| |
| Tomáš Jelínek, Jan Křivan, Vladimír Petkevič, Hana Skoumalová, Jana Šindlerová (2021): [[https://doi.org/10.1007/978-3-030-83527-9_4|SYN2020: A new corpus of Czech with an innovated annotation]]. In: K. Ekštein – F. Pártl – M. Konopík (eds.), //Text, Speech, and Dialogue.// TSD 2021. Lecture Notes in Computer Science, vol. 12848. Cham: Springer, pp. 48–59. |
| |
| Křivan, J. – Šindlerová, J. (2022): [[https://asjournals.lib.cas.cz/slovoaslovesnost/article/uuid:286197ce-8b36-43ac-9563-eba2abf8ca0e|Změny v morfologické anotaci korpusů řady SYN: nové možnosti zkoumání české gramatiky a lexikonu]]. //Slovo a slovesnost//, 83, 2/2022, pp. 122–145. |
| |
| * You can also cite any of the following articles that relate to the annotation used: |
| |
Jan Hajič: //Disambiguation of Rich Inflection (Computational Morphology of Czech)//. Vol. 1. Karolinum Charles University Press, Praha 2004. | Jan Hajič: //Disambiguation of Rich Inflection (Computational Morphology of Czech)//. Vol. 1. Karolinum Charles University Press, Praha 2004. |
Milan Straka, Jana Straková, Jan Hajič (2019): Czech Text Processing with Contextual Embeddings: POS Tagging, Lemmatization, Parsing and NER. In: Proceedings of the 22nd International Conference on Text, Speech and Dialogue - TSD 2019, Lecture Notes in Computer Science, ISSN 0302-9743, 11697, pp. 137-150. | Milan Straka, Jana Straková, Jan Hajič (2019): Czech Text Processing with Contextual Embeddings: POS Tagging, Lemmatization, Parsing and NER. In: Proceedings of the 22nd International Conference on Text, Speech and Dialogue - TSD 2019, Lecture Notes in Computer Science, ISSN 0302-9743, 11697, pp. 137-150. |
| |
Tomáš Jelínek, Jan Křivan, Vladimír Petkevič, Hana Skoumalová, Jana Šindlerová (2021): [[https://doi.org/10.1007/978-3-030-83527-9_4|SYN2020: A new corpus of Czech with an innovated annotation]]. In: K. Ekštein – F. Pártl – M. Konopík (eds.), //Text, Speech, and Dialogue.// TSD 2021. Lecture Notes in Computer Science, vol. 12848. Cham: Springer, pp. 48–59. | * For the lemmatization and tagging of the spoken ORAL corpus, you can also cite: |
| |
For the lemmatization and tagging of the spoken ORAL corpus, you can also cite: | |
| |
Marie Kopřivová, Zuzana Komrsková, David Lukeš, Petra Poukarová (2017): Korpus ORAL: sestavení, lemmatizace a morfologické značkování. In: //Korpus -- gramatika -- axiologie//, 15, pp. 47–67. | Marie Kopřivová, Zuzana Komrsková, David Lukeš, Petra Poukarová (2017): Korpus ORAL: sestavení, lemmatizace a morfologické značkování. In: //Korpus -- gramatika -- axiologie//, 15, pp. 47–67. |