~~NOTOC~~ ====== Jiří Milička ====== *[[https://scholar.google.com/citations?hl=en&user=71SLjBMAAAAJ|Google scholar]], [[https://www.researchgate.net/profile/Jiri-Milicka|ResearchGate]] *[[https://www.scopus.com/authid/detail.uri?authorId=55946511400|Scopus]], ORCID: [[https://orcid.org/0000-0001-8605-1199|0000-0001-8605-1199]] Editorial board membership: [[https://www.tandfonline.com/toc/njql20/current|Journal of Quantitative Linguistics]], [[https://sciendo.com/journal/LF|Linguistic Frontiers]] Other memberships: [[https://www.iqla.org/|The International Quantitative Linguistics Association]] ===== Focus ===== *corpus linguistics *quantitative linguistics *Arabic language ===== Education ===== *2010–2016 PhD (Charles University, Prague), thesis: The Theory of Communication as an Explanatory Principle for the Natural Multilevel Text Segmentation *2005–2010 MA in Arabic studies and History of Islamic Countries (Charles University, Prague) ===== Employment ===== *2013–2022 Institute of Comparative Linguistics (Charles University, Prague) *2017–now Institute of the Czech National Corpus (Charles University, Prague) ===== Papers ===== ==== Preprints ==== * Milička, J. (2024). Simple stochastic processes behind Menzerath’s Law. arXiv [Cs.CL]. Retrieved from http://arxiv.org/abs/2409.00279 * Milička, J. (2024). Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models. arXiv [Cs.CL]. Retrieved from http://arxiv.org/abs/2408.16740 ==== 2024 ==== * Milička, J., Marklová, A., VanSlambrouck, K., Pospíšilová, E., Šimsová, J., Harvan, S., & Drobil, O. (2024). Large language models are able to downplay their cognitive abilities to fit the persona they simulate. //Plos one, 19(3), [[https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0298522|e0298522]]//. * Milička, J., & Šebestová, D. (2024). Query a corpus in near-natural language: A human-friendly corpus query language not only for linguists. In S. Buschfeld, P. Ronan, T. Neumaier, A. Weilinghoff, & L. Westermayer (Eds.), //Crossing Boundaries through Corpora: Innovative Approaches to Corpus Linguistics.// John Benjamins. ISBN 9789027215949. ==== 2023 ==== * Milička, J. (2023). Menzerath’s law: Is it just regression toward the mean? //Glottometrics//, 55. doi: [[https://glottometrics.iqla.org/409-menzeraths-law-is-it-just-regression-toward-the-mean/|10.53482/2023_55_409]]. ==== 2022 ==== * Milička, J., Cvrček, V., & Lukeš, D.: Unpacking lexical intertextuality: Vocabulary shared among texts. Yamazaki, M., Sanada, H., Köhler, R., Embleton, S., Vulanović, R., & Wheeler, E. S. (Eds). //Quantitative Approaches to Universality and Individuality in Language//. Berlin/Boston: De Gruyter Mouton. 101-116. DOI: [[https://doi.org/10.1515/9783110763560-009|10.1515/9783110763560-009]] *Zemánek, P. & Milička, J.: Frankové očima Arabů v klasickém a moderním období. In O. Lomová, J. Malečková & K. Šíma (Eds.), //Setkávání kultur. Identity, ideologie, jazyky// (pp. 233-246). Praha: Univerzita Karlova, Filozofická fakulta. ISBN 978-80-7671-085-6. ==== 2021 ==== *Milička, J., Cvrček, V., & Lukešová, L.: Modelling crosslinguistic n‑gram correspondence in typologically different languages. //Languages in Contrast 21(2)//, 217-249. DOI: [[http://doi.org/10.1075/lic.19018.mil|10.1075/lic.19018.mil]]. ISSN: 1387-6759. *Milička, J., & Houzar, A.: Phonological properties as predictors of text success. In A. Pawłowski, S. Embleton, J. Mačutek and G. Mikros (eds.), //Language and Text: Data, models, information and applications// (pp. 177–194). John Benjamins. ISBN 9789027210104. *Matlach, V., Krivochen, D. G., & Milička, J.: A method for the comparison of general sequences via type-token ratio. In A. Pawłowski, S. Embleton, J. Mačutek and G. Mikros (eds.), //Language and Text: Data, models, information and applications// (pp. 37–54). John Benjamins. ISBN 9789027210104. *Malá, M., Šebestová, D., & Milička, J.: The expression of time in English and Czech children’s literature. In A. Čermáková, T. Egan, H. Hasselgård & S. Rørvik (eds.), //Time in Languages, Languages in Time// (pp 283–304). John Benjamins. ISBN 978-90-272-0968-9. *Kubát, M., Hůla, J., Chen, X., Čech, R., & Milička, J.: The lexical context in a style analysis: A word embeddings approach. //Corpus Linguistics and Linguistic Theory, 17(2)//, 443-464. ==== 2020 ==== *Milička, J.: Kolik procent lexikálních výpůjček můžeme očekávat ve slovenském textu?. //[[https://www.juls.savba.sk/ediela/sr/2020/1/sr20-1.pdf|Slovenská reč, 85(1)]]//, 76–81. *Kováříková, D., Škrabal, M., Cvrček, V., Lukešová, L., & Milička, J.: Lexicographer’s Lacunas or How to Deal with Missing Representative Dictionary Forms on the Example of Czech. //International Journal of Lexicography, 33(1)//, 90-103. ==== 2019 ==== *Mačutek, J., Čech, R., & Milička, J.: Length of non-projective sentences: A pilot study using a Czech UD treebank. In //Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019)// (pp. 110–117). ISBN 978-1-950737-65-9. *Čech, R., Hůla, J., Kubát, M., Chen, X., & Milička, J.: The development of context specificity of lemma. A word embeddings approach. //Journal of Quantitative Linguistics, 26(3)//, 187-204. *Hůla, J., Kubát, M., Čech, R., Chen, X., Číž, D., Pelegrinová, K., & Milička, J.: Context Specificity of Lemma. Diachronic Analysis. //Glottometrics 45 2019//, 7. ==== 2018 ==== *Juola, P., Milička, J., & Zemánek, P.: Authorship and time attribution of Arabic texts using JGAAP. In K. Shaalan, A. E. Hassanien & F. Tolba (eds.), //Intelligent Natural Language Processing: Trends and Applications// (pp. 325–349). Springer, Cham. ISBN: 978-3-319-67056-0. *Milička, J.: Average Word Length from the Diachronic Perspective: The Case of Arabic. //Linguistic Frontiers, 1(2)//, 81-89. *Milička, J., & Kalábová, H.: Vowel Disharmony in Czech Words and Stems. In M. Fidler & V. Cvrček (eds.), //Taming the Corpus: From Inflection and Lexis to Interpretation// (pp. 37–61). Springer, Cham. ISBN: 978-3-319-98017-1. *Čech, R., Milička, J., Mačutek, J., Koščová, M., & Lopatková, M.: Quantitative Analysis of Syntactic Dependency in Czech. In J. Jiang & H. Liu (eds.), //Quantitative Analysis of Dependency Structures// (pp 53–70). ISBN: 978-3-11-057356-5. ==== 2017 ==== *Diatka, V., & Milička, J: The effect of iconicity flash blindness: An empirical study. In A. Zirker, M. Bauer, O. Fisher & C. Ljungberg (eds.), //Dimensions of Iconicity// (pp 3–14). John Benjamins. ISBN 978-90-272-4351-5. *Mačutek, J., Čech, R., & Milička, J.: [[https://aclanthology.org/volumes/W17-65/|Menzerath-Altmann Law in Syntactic Dependency Structure]]. In S. Montemagni & J. Nivre (eds.), //Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy (No. 139, pp. 100–107).// Linköping University Electronic Press. ISBN: 978-91-7685-467-9. ==== 2016 ==== *Milička, J.: Key Length Motifs in Czech and Arabic Texts. In E. Kelih, R. Knight, J. Mačutek & A. Wilson (eds.), //Studies in Quantitative Linguisitcs 23//. (pp. 27–42). RAM – Verlag. ISBN: 978-3-942303-44-6. *Čéplö, S., Bátora, J., Benkato, A., Milička, J., Pereira, C., & Zemánek, P.: Mutual intelligibility of spoken Maltese, Libyan Arabic, and Tunisian Arabic functionally tested: A pilot study. //Folia Linguistica, 50(2)//, 583-628. *Zemánek, P., & Milička, J.: Restricted collocability and its use in Arabic Corpus Linguistics. In G. C. Pastor (ed.), //Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives.// (pp. 67–78). Tradulex. ISBN: 978-2-9700736-5-9. ==== 2015 ==== *Milička, J.: Synergetic Linguistics: Do We Need Better Explanatory Mechanism?. //Glottotheory, 6(2)//, 291-298. *Milička, J.: Is the Distribution of L-Motifs Inherited from the Word Lengths Distribution?. In G. K. Mikros & J. Mačutek (eds.) //Sequences in Language and Text// (pp 133–146). De Gruyter. ISBN: 978-3-11-036273-2. *Milička, J.: Is Menzerath’s Law a consequence of segment inventory inhomogeneity?. //Czech and Slovak Linguistic Review, 2015(2)//, 62-71. ==== 2014 ==== *Milička, J.: Menzerath’s law: the whole is greater than the sum of its parts. //Journal of Quantitative Linguistics, 21(2)//, 85-99. *Mikros, G., & Milička, J.: Distribution of the Menzerath’s law on the syllable level in Greek texts. In G. Altmann, R. Čech, J. Mačutek & L. Uhlířová (eds). //Empirical approaches to text and language analysis// (pp 180–189). RAM - Verlag. ISBN 978-3-942303-24-8. *Zemánek, P., & Milička, J.: [[https://aclanthology.org/volumes/W14-09/|Quotations, relevance and time depth: Medieval Arabic literature in grids and networks]]. In A. Feldman, A. Kazantseva, & S. Szpakowicz (eds.) //Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL)// (pp. 17–24). ISBN 978-1-937284-88-6. *Zemánek, P., & Milička, J.: Ranking Search Results for Arabic Diachronic Corpora. Google-like search engine for (non) linguists. In A. Lakhouaja (ed.), //Proceedings of the 5th International Conference on Arabic Language Processing (CITALA 2014)// (pp. 73–78). Oujda. ==== 2013 ==== *Kubát, M., & Milička, J.: Vocabulary richness measure in genres. //Journal of Quantitative Linguistics, 20(4)//, 339-349. *Milička, J.: Rank-frequency relation & type-token relation: Two sides of the same coin. In I. Obradović, E. Kelih & R. Köhler (eds.), //Methods and Applications of Quantitative Linguistics: Selected papers of the 8th International Conference on Quantitative Linguistics (QUALICO)// (pp. 163–171). ISBN 978-86-7466-465-0. ==== 2012 ==== *Milička, J.: Minimal ratio: an exact metric for keywords, collocations etc. //Czech and Slovak Linguistic Review, 2012(1)//, 62-70. *Chromý, J., & Milička, J.: Experimentální zkoumání stylotvorných faktorů: první výstupy. //Naše řeč (Our Speech), 95(4)//, 181-187. ==== 2011 ==== *Milička, J.: A Combinatorial Method for a Context Comparison. In: E. Kelih, V. Levickij &Y. Matskulyak (eds.), //Issues in Quantitative Linguistics 2// (pp 104–109). RAM – Verlag. ISBN 978-3-942303-07-1. *Milička, J.: [[https://www.birmingham.ac.uk/Documents/college-artslaw/corpus/conference-archives/2011/Paper193.pdf|Valency and Information Structure: A quantitative approach to from–to juxtaposition in Arabic]]. In //Proceedings of CLC 2011//. ==== 2010 ==== *Milička, J.: Budování česko-arabského paralelního korpusu. In F. Čermák & J. Kocek (eds.), //Mnohojazyčný korpus Intercorp: Možnosti studia// (pp 221–225). Nakladatelství Lidových novin. ISBN 978-80-7422-058-6. ==== 2009 ==== *Milička, J.: Type-token & hapax-token relation: A combinatorial model. //Glottotheory, 2(1)//, 99-110. *Milička, J.: [[http://milicka.ff.cuni.cz/kestazeni/clanek_Novy_Orient.pdf|Knihtisk v dějinách islámské kultury]]. //Nový orient, 64(2)//, 42–46. ===== Applications ===== *[[http://alpha.korpus.cz|Alpha]]: Překladač z přirozeného jazyka do CQL (viz [[en:manualy:alpha|info]]) *[[http://milicka.cz/en/engrammer|Engrammer]]: Nástroj pro explorativní analýzu kolokací. *[[http://www.milicka.cz/keyworder|KeyWorder]]: Program pro rozpoznávání klíčových slov v textu pomocí minimálního poměru. *[[http://www.milicka.cz/typetokener|TypeTokener]]: Program, který měří type-token relation, hapax-token relation atd. zvoleného textu a následně pomocí změřené distribuce typů tyto veličiny zpětně modeluje. *[[http://www.milicka.cz/lexicographerscalculator|Lexicographers' Calculator]]: Program pro plánování rozsahu korpusu. *[[http://milicka.cz/tinfi|Tinfi]]: Program, který označuje části textu, jež z něj vyčnívají. *[[http://milicka.cz/blacksquare|BlackSquare]]: Program pro jednoduché (nejen) lingvistické experimenty. *[[http://zumky.com/|Zumky]]: Komunikační nástroj pro všechny, kteří si váží svého času, klidu a soukromí. ===== Books ===== *Zemánek, P., & Milička, J. (2017): Words Lost and Found: The Diachronic Dynamics of the Arabic Lexicon. RAM-Verlag. 234 p. ISBN: 978-3-942303-45-3. *Zemánek, P., Milička, J., & Ondráš, F. (2017): Al-haraka baraka. Strukturně-variační pohled na středověká arabská přísloví a rčení. Univerzita Karlova, Filozofická fakulta. 167 p. ISBN 978-80-7308-749-4. ===== Theses ===== *Milička, J. (2022): //[[http://milicka.cz/habilitace.pdf|Lexikální diverzita]]// (Habilitation thesis). *Milička, J. (2015): //[[http://milicka.cz/disertace.pdf|Teorie komunikace jakožto explanatorní princip přirozené víceúrovňové segmentace textů]]// (PhD thesis). ===== Reviews ===== *Milička, J. (2014): Kontroverzní hranice jazykovědy aneb O syntagmatických očích Hany Karadžičové [Review of Kvantitativní analýza kontextu by V. Cvrček]. //Naše řeč, (4-5)//, 300-304. *Milička, J. (2018): Kapitoly z korpusové versologie — cesta správným směrem [Review of Kapitoly z korpusové versologie, by P. Plecháč & R. Kolár]. //Česká Literatura, 66(2)//, 286–289. ===== Presentations ===== *10/2024 (Dominika Kováříková, JM, Václav Cvrček, Michal Láznička) Presentation //Unlocking Lexical Meaning through Grammatical Profiling.// at EURALEX conference (Cavtat, Croatia). *6/2024 (JM, Anna Marklová, Václav Cvrček) Presentation //Exploring register variation in human and machine-generated texts: A comparative analysis.// at ICAME conference (Vigo, Spain). *6/2024 Presentation //Mechanical Corpus Linguist// at 4EU+ AI Days Conference (Prague, ČR). *5/2024 Presentation //Let’s Delve into the Intricate Tapestry of the Chatgptese// at International Workshop on Corpus and Computational Linguistics (Ostrava, Czech Republic, invited). *5/2024 Presentation //Exploring Habibi Corpus: Mapping latent space to real geographic space// at AIDA conference (Valletta, Malta). *2/2024 Presentation //Not Your Training Data – Not Your Culture: Exploring Variations in Gender Bias in Large Language Models// at Gender, Technology, and Digital Cultures in the Middle East Conference (Doha, Qatar, invited). *11/2023 Presentation //Hledání v korpusech pomocí velkých jazykových modelů: příklady z lingvistiky a dalších oborů// at Humanitní a společenské vědy perspektivou Digital Humanities (Olomouc, invited). *9/2023 Presentation //Our Timelines// at [[http://milicka.ff.cuni.cz/AIAL2023|AIAL2023]] (Towards AI-Aided Human-Supervised Linguistics, Prague, organizer) *6/2023 Presentation //Modelling Menzerath’s Law with Gaussian Copula// at the QUALICO 2023 conference (Lausanne). *6/2023 Presentation //A Guided Tour through the Labyrinth of Lexical Diversity// at the International Workshop on Corpus Stylistics and Stylometrics (Ostrava, invited). *6/2023 (JM and Petr Zemánek) Poster //Principal Component Analysis of Written Arabic Dialects// at the Olinco 2023 conference (Olomouc, Best Poster Award). *11/2022 (JM and Dominika Kováříková) Presentation //Jak vytěžit textová data Českého národního korpusu pomocí KonTextu (Textual data mining from the Czech National Corpus using KonText)// at the conference Digitální data perspektivou humanitního vědce (Digital Data from a Humanities Perspective) (Brno, hybrid, invited). *11/2022 Presentation //Engrammer, nástroj na automatickou extrakci frazeologie (Engrammer, a tool for automatic extraction of phraseology)// at the workshop Vývoj elektronické lexikální databáze indoíránských jazyků a podpora zavádění moderních technologií do výuky jazyků (Development of an Electronic Lexical Database of Indo-Iranian Languages and Support for Introducing Modern Technologies into Language Teaching) (Prague, invited). *5/2022 Presentation //The Menzerath-Altmann Law: Time to move on// at the III. Summer Workshop for Statistics in Linguistics (Trojanovice, invited). *5/2022 Presentation //Measuring lexical diversity: The influence of lemmatization// at the colloquium SlavLingColl (Berlin, invited). *9/2021 (JM, Václav Cvrček, and David Lukeš) Presentation //Unpacking Lexical Intertextuality – Number of Types Shared Among Texts// at the QUALICO conference (Tokyo, online). *8/2021 (JM and Denisa Šebestová) Presentation //Human Friendly Corpus Query Language// at the ICAME conference (Dortmund). *11/2019 Presentation //Engrammer — On the borders between language and other cultural phenomena that can be quantitatively analyzed via corpus// at the Corpus Driven Quantitative Linguistics Workshop (in Ostrava; invited). *9/2019 (JM and Denisa Šebestová) Presentation //Engrammer: Introducing a new tool for the identification of phraseological patterning. Demo and case study on Czech, English, and Arabic// at the EUROPHRAS conference (Málaga). *8/2019 (Ján Mačutek, Radek Čech, and JM) Presentation //Length of non-projective sentences: A pilot study using a Czech UD treebank// at the Quasy conference held during SyntaxFest 2019, Paris. *7/2019 (JM, Václav Cvrček, and Lucie Lukešová) Presentation //N-gram Length Correspondence in Typologically Different Languages// at the CL2019 Cardiff conference. *6/2019 (Denisa Šebestová, Markéta Malá, and JM) Presentation //The expression of time in English and Czech children’s literature: A contrastive phraseological perspective// at the ICAME conference (Neuchatel). *3/2019 Presentation //Analysis of Liberal Translations and Cross-Language Plagiarism// at the Linguistic Afternoon 2019 meeting (Olomouc, invited). *9/2018 (JM and Alžběta Růžičková) Presentation //Slovak Vowel Phonotactics: Slavic Origins vs. Hungarian Influences// at the SlaviCorp conference (Prague). *7/2018 (JM and Alžběta Růžičková) Presentation //Demand and Supply in the Communication Process: The Case of Lexical Richness and Phonological Features// at the QUALICO conference (Wroclaw). *9/2017 (Jan Mačutek, Radek Čech, and JM) Presentation and poster //Menzerath-Altmann Law in Syntactic Dependency Structure// at the Depling conference (Pisa). *5/2017 (JM and Hana Kalábová) Presentation //Vowel Disharmony in Czech: Description and Explanation// at the Linguistics Prague conference. *3/2017 Presentation //From – To Construction in Arabic and Czech// at the Word Order and Information Structure: a Cross- and Intra-Linguistic Perspective conference (Olomouc; invited). *2/2017 Presentation //Menzerathův-Altmannův zákon: adorovaný model podivného vztahu (Menzerath's-Altmann's Law: An Idolised model of a strange relationship)// at the colloquium Kritické pohledy na Menzerathův-Altmannův zákon (Critical Views on Menzerath's-Altmann's Law) (Ostrava; invited). *8/2016 (JM and Karolína Vyskočilová) Presentation //Models of noisy channels that speech gets over// at the QUALICO conference (Trier). *12/2015 (JM and Petr Zemánek) Presentation //Tolerant algorithm for quotation extraction// at the Digital Arabic and Persian Research Workshop (Leipzig; invited). *11/2015 Poster //From Linguistic Theory to an Effective Quotation Extraction Algorithm// at the symposium Methods and Linguistic Theories (MaLT 2015) (Bamberg). *10/2015 (Vojtěch Diatka and JM) Presentation //Můžou se neikonická slova někdy chovat jako ikonická? (Can non-iconic words sometimes behave like iconic ones?)// at the Lingvistika Praha (Linguistics Prague) conference. *7/2015 (JM and Petr Zemánek) Poster //Hypertextualizer. Quotation Extraction Software// at the Corpus Linguistics 2015 conference (Lancaster). *7/2015 (Vojtěch Diatka and JM) Poster //The Iconicity of the "Non-Iconic Words" and its Effects on Language Processing// at the 12th International Symposium of Psycholinguistics (Valencia). *6/2015 (JM and Petr Zemánek) Presentation //Restricted Collocability and its Use in Arabic Corpus Linguistics// at the EUROPHRAS 2015 conference (Malaga). *3/2015 (Vojtěch Diatka, Jiří Milička) Presentation //Are Iconic Words Statistically more Iconic than Non-Iconic Ones? A New Method of Testing// at the 10th International Symposium on Iconicity in Language and Literature (Tübingen). *6/2014 Presentation //Three Models for the Menzerath's Law// at the QUALICO conference (organized by IQLA). *5/2014 Presentation //Konfidenční intervaly v empirické lingvistice (Confidence intervals in empirical linguistics)// at the Lingvistika Praha (Linguistics Prague) conference. *4/2014 (JM and Petr Zemánek) Presentation //Quotations, Relevance, and Time Depth: Medieval Arabic Literature in Grids and Networks// at the EACL conference in Gothenburg (organized by the Association for Computational Linguistics). *7/2012 Presentation //Rank-frequency Relation & Type-token Relation: Two Sides of the Same Coin// at the QUALICO conference. *7/2011 Presentation //Valency and the Information Structure. A Quantitative Approach// at the Corpus Linguistics Conference in Birmingham. *4/2011 (Petr Zemánek and JM) Presentation //Arabic Plurals in Context. A Corpus Study// at the Workshop on Arabic Corpus Linguistics in Lancaster. *9/2009 Presentation //Budování česko-arabského paralelního korpusu (Building the Czech-Arabic Parallel Corpus)// at the Intercorp conference in Prague. ===== Translations into Czech ===== * Muntasir al-Qaffáš: On. In //Antologie moderních arabských povídek.// Praha 2011, pp 93-97. * (Translated with Anna Humlová) Alí ad-Du'áží: //Po hospodách kolem Středozemního moře.// Praha, Malvern 2013, 76 s. ===== Teaching ===== *Previously taught * Arabic and Corpus * Introduction to Quantitative Linguistics * Writing an article on a corpus-linguistic topic *Currently taught * General Linguistic Laws in texts * Use of Large Language Models *I am currently involved in courses * Working with corpora: Case studies * Introduction to linguistic corpora ===== Internships ===== * 4/2013-6/2013 Internship at the University of Trier. * 10/2013-11/2013 Internship at the National and Kapodistrian University of Athens.