Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:cnk:intercorp:verze16ud [2024/10/11 11:08] – [Corpus size in thousands of words by language and collection] alexandrrosen | en:cnk:intercorp:verze16ud [2024/10/18 20:41] (current) – [References – about UD-annotated InterCorp] alexandrrosen |
---|
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]| 1| 6 556| 6 594.8| 32 635.9| 38 097.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]| 1| 6 556| 6 594.8| 32 635.9| 38 097.3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 117| 116 660| 25 976.1| 123 357.7| 165 696.1| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 117| 116 660| 25 976.1| 123 357.7| 165 696.1| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 310| 138 571| 33 957.7| 258 555.1| 315 325.2| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]| 310| 138 571| 33 957.7| 258 555.1| 315 325.2| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]| 1| 146| 121.7| 622.1| 797.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]| 1| 146| 121.7| 622.1| 797.9| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]| 1| 33 935| 27 608.8| 129 458.6| 172 973.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]| 1| 33 935| 27 608.8| 129 458.6| 172 973.7| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]| 8| 61| 116.6| 832.7| 988.1| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]| 8| 61| 116.6| 832.7| 988.1| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]| 327| 35 447| 30 758.6| 162 943.8| 208 413.5| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]| 327| 35 447| 30 758.6| 162 943.8| 208 413.5| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]| 13| 13| 41.6| 466.3| 586.3| | ^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]| 13| 13| 41.6| 466.3| 586.3| |
^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]| 95| 125 933| 34 510.0| 178 525.6| 240 411.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]| 95| 125 933| 34 510.0| 178 525.6| 240 411.9| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]| 1| 7| 3.9| 23.5| 30.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]| 1| 7| 3.9| 23.5| 30.6| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]| 1| 8 350| 8 112.7| 37 824.9| 49 694.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]| 1| 8 350| 8 112.7| 37 824.9| 49 694.7| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]| 1| 1 135| 1 497.9| 7 374.2| 9 299.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]| 1| 1 135| 1 497.9| 7 374.2| 9 299.9| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]| 194| 134 401| 33 361.2| 226 224.9| 286 343.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]| 194| 134 401| 33 361.2| 226 224.9| 286 343.4| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]| 37| 2 363| 2 296.7| 16 138.6| 18 020.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]| 37| 2 363| 2 296.7| 16 138.6| 18 020.3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]| 1| 204| 198.4| 871.1| 1 179.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]| 1| 204| 198.4| 871.1| 1 179.0| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]| 1| 4| 4.1| 13.9| 19.2| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]| 1| 4| 4.1| 13.9| 19.2| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]| 1| 1 605| 1 641.1| 5 964.3| 7 294.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]| 1| 1 605| 1 641.1| 5 964.3| 7 294.3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]| 28| 87 642| 3 622.1| 34 786.3| 45 134.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]| 28| 87 642| 3 622.1| 34 786.3| 45 134.4| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]| 78| 86 356| 3 023.6| 35 425.1| 45 293.5| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]| 78| 86 356| 3 023.6| 35 425.1| 45 293.5| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]| 109| 3 541| 3 907.8| 23 993.1| 30 898.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]| 109| 3 541| 3 907.8| 23 993.1| 30 898.6| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]| 1| 285| 365.3| 1 258.4| 1 793.5| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]| 1| 285| 365.3| 1 258.4| 1 793.5| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]| 1| 1 496| 1 712.1| 7 828.0| 10 573.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]| 1| 1 496| 1 712.1| 7 828.0| 10 573.3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]| 1| 8 963| 784.8| 13 805.0| 16 643.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]| 1| 8 963| 784.8| 13 805.0| 16 643.6| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]| 232| 132 791| 33 065.4| 233 111.3| 284 402.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]| 232| 132 791| 33 065.4| 233 111.3| 284 402.6| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]| 105| 9 163| 8 344.6| 48 750.2| 61 120.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]| 105| 9 163| 8 344.6| 48 750.2| 61 120.3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]| 360| 140 055| 41 282.4| 227 242.6| 300 207.8| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]| 360| 140 055| 41 282.4| 227 242.6| 300 207.8| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]| 107| 147 063| 46 510.1| 280 566.2| 355 121.8| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 107| 147 063| 46 510.1| 280 566.2| 355 121.8| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 2| 2| 1.7| 13.6| 17.7| | ^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 2| 2| 1.7| 13.6| 17.7| |
^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 55| 102 904| 39 561.2| 235 702.3| 295 301.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ru|ru]]| 55| 102 904| 39 561.2| 235 702.3| 295 301.3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 184| 32 839| 22 985.2| 122 130.4| 163 120.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 184| 32 839| 22 985.2| 122 130.4| 163 120.7| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| 1| 499| 522.5| 2 313.4| 3 021.8| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| 1| 499| 522.5| 2 313.4| 3 021.8| |
| |
===== References – about UD-annotated InterCorp ===== | ===== References – about UD-annotated InterCorp ===== |
| |
| Rosen, A. (2024): Lexical and syntactic variability |
| of languages and text genres – a corpus-based study. [[https://www.youtube.com/watch?v=E2ujmqt7Q2E|Recording]] from 14 October 2024: [[https://zil.ipipan.waw.pl/|Natural Language Processing Seminar]] organised by the [[https://zil.ipipan.waw.pl|Linguistic Engineering Group]] at the [[https://ipipan.waw.pl|Institute of Computer Science]] [[https://pan.pl|Polish Academy of Sciences]], [[https://zil.ipipan.waw.pl/seminarium-archiwum?action=AttachFile&do=view&target=2024-10-14.pdf|slides]]. |
| |
Olga Nádvorníková (2024): Analyse contrastive de la complexité syntaxique à l’aide de corpus parallèles. Translitteræ, Laboratoire LATTICE (Langues, Textes, Traitements informatiques et Cognition) – CNRS UMR 8094 (Centre national de la recherche scientifique: Unité mixte de recherche), ENS (L'École normale supérieure). Paris, 28/05/2024. [[https://www.youtube.com/watch?v=wJrCez_XPQY|Video]], [[https://jakobson.korpus.cz/~rosen/INTERCORP/SLIDES/C4%20Nadvornikova%20Analyse%20contrastiv%20e%20de%20la%20complexité%20syntaxique.pdf|slides]] | Olga Nádvorníková (2024): Analyse contrastive de la complexité syntaxique à l’aide de corpus parallèles. Translitteræ, Laboratoire LATTICE (Langues, Textes, Traitements informatiques et Cognition) – CNRS UMR 8094 (Centre national de la recherche scientifique: Unité mixte de recherche), ENS (L'École normale supérieure). Paris, 28/05/2024. [[https://www.youtube.com/watch?v=wJrCez_XPQY|Video]], [[https://jakobson.korpus.cz/~rosen/INTERCORP/SLIDES/C4%20Nadvornikova%20Analyse%20contrastiv%20e%20de%20la%20complexité%20syntaxique.pdf|slides]] |