| Both sides previous revisionPrevious revisionNext revision | Previous revision |
| en:pojmy:lexikalni_bohatost [2024/09/30 18:20] – [Lexical Diversity] alexandrrosen | en:pojmy:lexikalni_bohatost [2024/10/18 21:07] (current) – [References] alexandrrosen |
|---|
| * For languages which are not linguistically annotated, only the measure counting word forms (''lexDivWord'') is available. For such languages its calculation is based on tokens rather than words, i.e. punctuation is not ignored. This is why the lexDivWord values may be lower than expected in comparison to other texts in linguistically annotated languages. | * For languages which are not linguistically annotated, only the measure counting word forms (''lexDivWord'') is available. For such languages its calculation is based on tokens rather than words, i.e. punctuation is not ignored. This is why the lexDivWord values may be lower than expected in comparison to other texts in linguistically annotated languages. |
| * In KonText, they can be displayed and queried like any other metadata items about a text, such as author or text ID. | * In KonText, they can be displayed and queried like any other metadata items about a text, such as author or text ID. |
| * | * Average values for all combinations of a language and a text type in InterCorp v16ud are shown in the table on [[https://wiki.korpus.cz/doku.php/en:cnk:intercorp:verze16ud#detailed_statistics|Detailed statistics]]. |
| * See also [[en:pojmy:syntakticka_komplexita|measures of syntactic complexity]]. | * See also [[en:pojmy:syntakticka_komplexita|measures of syntactic complexity]]. |
| |
| ===== References ===== | ===== References ===== |
| |
| [[https://docs.google.com/document/d/1nSPzyhT6oHKUDN8A_uYmWrZH6tAmxTH_pUMOdjg01Eg/edit?usp=sharing|InterCorp a Universal Dependencies: nové možnosti výzkumu]] (workshop 20. a 27. 3. 2024 v rámci Teoreticko-metodologického semináře Ústavu českého jazyka a teorie komunikace) | Alexandr Rosen (2024): Lexical and syntactic variability |
| | of languages and text genres – a corpus-based study. [[https://www.youtube.com/watch?v=E2ujmqt7Q2E|Recording]] from 14 October 2024: [[https://zil.ipipan.waw.pl/|Natural Language Processing Seminar]] organised by the [[https://zil.ipipan.waw.pl|Linguistic Engineering Group]] at the [[https://ipipan.waw.pl|Institute of Computer Science]] [[https://pan.pl|Polish Academy of Sciences]], [[https://zil.ipipan.waw.pl/seminarium-archiwum?action=AttachFile&do=view&target=2024-10-14.pdf|slides]]. |
| |
| [[https://drive.google.com/file/d/1L9yTjj0bTrGgf8lDcOAsJoJOoeYEoPEm/view?usp=sharing|Exploring InterCorp v16ud: the potential of a multilingual parallel treebank with complexity and diversity metrics]] (slides from the seminar at the University of Warsaw, 10 July 2024) | |
| | Alexandr Rosen (2024). [[https://drive.google.com/file/d/1L9yTjj0bTrGgf8lDcOAsJoJOoeYEoPEm/view?usp=sharing|Exploring InterCorp v16ud: the potential of a multilingual parallel treebank with complexity and diversity metrics]] (slides from the seminar at the University of Warsaw, 10 July 2024) |
| |
| |