| Both sides previous revisionPrevious revisionNext revision | Previous revision |
| en:cnk:intercorp:verze16ud [2024/10/11 11:03] – [Detailed statistics] alexandrrosen | en:cnk:intercorp:verze16ud [2026/04/07 22:45] (current) – [Main features of release 16ud] alexandrrosen |
|---|
| ^ ::: ^ publication date | 2024 ^^^^ | ^ ::: ^ publication date | 2024 ^^^^ |
| ^ ::: ^ foreign languages | 61 ^^^^ | ^ ::: ^ foreign languages | 61 ^^^^ |
| ^ ::: ^ tagged languages | 47 ^^^^ | ^ ::: ^ tagged languages | 48 ^^^^ |
| ^ ::: ^ lemmatized languages | 47 ^^^^ | ^ ::: ^ lemmatized languages | 48 ^^^^ |
| ^ ::: ^ syntactically annotated languages| 47 ^^^^ | ^ ::: ^ syntactically annotated languages| 48 ^^^^ |
| |
| ===== Access to the texts ===== | ===== Access to the texts ===== |
| * After 13ud, 16ud is the second release of InterCorp featuring linguistic annotation according to the [[en:pojmy:ud|Universal Dependencies]] scheme. | * After 13ud, 16ud is the second release of InterCorp featuring linguistic annotation according to the [[en:pojmy:ud|Universal Dependencies]] scheme. |
| * Release 16ud is the first CNC corpus to feature the metrics of <fs large>**[[en:pojmy:syntakticka_komplexita|syntactic complexity]]**</fs> and <fs large>**[[en:pojmy:lexikalni_bohatost|lexical diversity]]**</fs>.((We are grateful to Olga Nádvorníková, whose initiative and guidance made the addition of syntactic complexity and lexical diversity measures to the corpus metadata possible. We also thank Jiří Milička for his valuable advice on selecting appropriate lexical diversity measures.)) | * Release 16ud is the first CNC corpus to feature the metrics of <fs large>**[[en:pojmy:syntakticka_komplexita|syntactic complexity]]**</fs> and <fs large>**[[en:pojmy:lexikalni_bohatost|lexical diversity]]**</fs>.((We are grateful to Olga Nádvorníková, whose initiative and guidance made the addition of syntactic complexity and lexical diversity measures to the corpus metadata possible. We also thank Jiří Milička for his valuable advice on selecting appropriate lexical diversity measures.)) |
| * In release 16ud, out of the total number of 62 languages (including Czech), **47 are linguistically annotated**; in addition, all such languages are **syntactically annotated**. | * In release 16ud, out of the total number of 62 languages (including Czech), **48 are linguistically annotated**; in addition, all such languages are **syntactically annotated**. |
| * Texts are **annotated in the same way** in all languages, according to the UD standard ([[https://universaldependencies.org|Universal Dependencies]]). | * Texts are **annotated in the same way** in all languages, according to the UD standard ([[https://universaldependencies.org|Universal Dependencies]]). |
| * Annotation was performed for all languages by [[https://ufal.mff.cuni.cz/udpipe|UDPipe]], based on the data created in the UD project.((Annotation of this release used the following models: afrikaans-afribooms-ud-2.12-230717, | * Annotation was performed for all languages by [[https://ufal.mff.cuni.cz/udpipe|UDPipe]], based on the data created in the UD project.((Annotation of this release used the following models: afrikaans-afribooms-ud-2.12-230717, |
| |
| * Political commentaries published by [[http://www.project-syndicate.org/|Project Syndicate]] (below referred to as **Syndicate**) and [[http://www.voxeurop.eu|VoxEurop]] (formerly **PressEurop**) | * Political commentaries published by [[http://www.project-syndicate.org/|Project Syndicate]] (below referred to as **Syndicate**) and [[http://www.voxeurop.eu|VoxEurop]] (formerly **PressEurop**) |
| * A package of legal texts of the European Union form the [[https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis|Acquis Communautaire]] corpus (**Acquis**) | * A colection of legal texts of the European Union form the [[https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis|Acquis Communautaire]] corpus (**Acquis**) |
| * Proceedings of the European Parliament dated 2007–2011 from the [[http://www.statmt.org/europarl/|Europarl]] corpus (**Europarl**) | * Proceedings of the European Parliament dated 2007–2011 from the [[http://www.statmt.org/europarl/|Europarl]] corpus (**Europarl**) |
| * Film subtitles from the [[http://www.opensubtitles.org/|Open Subtitles]] database (**Subtitles**) | * Film subtitles from the [[http://www.opensubtitles.org/|Open Subtitles]] database (**Subtitles**) |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]| 1| 6 556| 6 594.8| 32 635.9| 38 097.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]| 1| 6 556| 6 594.8| 32 635.9| 38 097.3| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 117| 116 660| 25 976.1| 123 357.7| 165 696.1| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 117| 116 660| 25 976.1| 123 357.7| 165 696.1| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 310| 138 571| 33 957.7| 258 555.1| 315 325.2| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]| 310| 138 571| 33 957.7| 258 555.1| 315 325.2| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]| 1| 146| 121.7| 622.1| 797.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]| 1| 146| 121.7| 622.1| 797.9| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]| 1| 33 935| 27 608.8| 129 458.6| 172 973.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]| 1| 33 935| 27 608.8| 129 458.6| 172 973.7| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]| 8| 61| 116.6| 832.7| 988.1| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]| 8| 61| 116.6| 832.7| 988.1| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]| 327| 35 447| 30 758.6| 162 943.8| 208 413.5| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]| 327| 35 447| 30 758.6| 162 943.8| 208 413.5| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]| 13| 13| 41.6| 466.3| 586.3| | ^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]| 13| 13| 41.6| 466.3| 586.3| |
| ^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]| 95| 125 933| 34 510.0| 178 525.6| 240 411.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]| 95| 125 933| 34 510.0| 178 525.6| 240 411.9| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]| 1| 7| 3.9| 23.5| 30.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]| 1| 7| 3.9| 23.5| 30.6| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]| 1| 8 350| 8 112.7| 37 824.9| 49 694.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]| 1| 8 350| 8 112.7| 37 824.9| 49 694.7| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]| 1| 1 135| 1 497.9| 7 374.2| 9 299.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]| 1| 1 135| 1 497.9| 7 374.2| 9 299.9| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]| 194| 134 401| 33 361.2| 226 224.9| 286 343.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]| 194| 134 401| 33 361.2| 226 224.9| 286 343.4| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]| 37| 2 363| 2 296.7| 16 138.6| 18 020.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]| 37| 2 363| 2 296.7| 16 138.6| 18 020.3| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]| 1| 204| 198.4| 871.1| 1 179.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]| 1| 204| 198.4| 871.1| 1 179.0| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]| 1| 4| 4.1| 13.9| 19.2| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]| 1| 4| 4.1| 13.9| 19.2| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]| 1| 1 605| 1 641.1| 5 964.3| 7 294.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]| 1| 1 605| 1 641.1| 5 964.3| 7 294.3| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]| 28| 87 642| 3 622.1| 34 786.3| 45 134.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]| 28| 87 642| 3 622.1| 34 786.3| 45 134.4| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]| 78| 86 356| 3 023.6| 35 425.1| 45 293.5| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]| 78| 86 356| 3 023.6| 35 425.1| 45 293.5| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]| 109| 3 541| 3 907.8| 23 993.1| 30 898.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]| 109| 3 541| 3 907.8| 23 993.1| 30 898.6| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]| 1| 285| 365.3| 1 258.4| 1 793.5| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]| 1| 285| 365.3| 1 258.4| 1 793.5| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]| 1| 1 496| 1 712.1| 7 828.0| 10 573.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]| 1| 1 496| 1 712.1| 7 828.0| 10 573.3| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]| 1| 8 963| 784.8| 13 805.0| 16 643.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]| 1| 8 963| 784.8| 13 805.0| 16 643.6| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]| 232| 132 791| 33 065.4| 233 111.3| 284 402.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]| 232| 132 791| 33 065.4| 233 111.3| 284 402.6| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]| 105| 9 163| 8 344.6| 48 750.2| 61 120.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]| 105| 9 163| 8 344.6| 48 750.2| 61 120.3| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]| 360| 140 055| 41 282.4| 227 242.6| 300 207.8| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]| 360| 140 055| 41 282.4| 227 242.6| 300 207.8| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]| 107| 147 063| 46 510.1| 280 566.2| 355 121.8| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 107| 147 063| 46 510.1| 280 566.2| 355 121.8| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 2| 2| 1.7| 13.6| 17.7| | ^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 2| 2| 1.7| 13.6| 17.7| |
| ^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 55| 102 904| 39 561.2| 235 702.3| 295 301.3| | |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 184| 32 839| 22 985.2| 122 130.4| 163 120.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 184| 32 839| 22 985.2| 122 130.4| 163 120.7| |
| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ru|ru]]| 55| 102 904| 39 561.2| 235 702.3| 295 301.3| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| 1| 499| 522.5| 2 313.4| 3 021.8| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| 1| 499| 522.5| 2 313.4| 3 021.8| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sk|sk]]| 170| 94 585| 10 080.0| 74 862.7| 95 881.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sk|sk]]| 170| 94 585| 10 080.0| 74 862.7| 95 881.0| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]| – | – | – | – | – | – | – | 32 635.9| – ^ 32 635.9^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]| – | – | – | – | – | – | – | 32 635.9| – ^ 32 635.9^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 6 714.9| 44.4| 200.5| 15 264.2| 542.6| 10 109.3| – | 90 481.8| – ^ 123 357.7^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 6 714.9| 44.4| 200.5| 15 264.2| 542.6| 10 109.3| – | 90 481.8| – ^ 123 357.7^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 20 454.4| 194.3| 3 687.5| 26 298.4| 762.6| 17 186.4| 3 044.3| 181 033.4| 5 893.7^ 258 555.1^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]| 20 454.4| 194.3| 3 687.5| 26 298.4| 762.6| 17 186.4| 3 044.3| 181 033.4| 5 893.7^ 258 555.1^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]| – | – | – | – | – | – | – | 622.1| – ^ 622.1^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]| – | – | – | – | – | – | – | 622.1| – ^ 622.1^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]| – | – | – | – | – | – | – | 129 458.6| – ^ 129 458.6^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]| – | – | – | – | – | – | – | 129 458.6| – ^ 129 458.6^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]| 402.8| – | – | – | – | – | – | 429.9| – ^ 832.7^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]| 402.8| – | – | – | – | – | – | 429.9| – ^ 832.7^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]| 22 763.6| 242.6| 1 523.4| – | 569.9| – | – | 137 844.3| – ^ 162 943.8^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]| 22 763.6| 242.6| 1 523.4| – | 569.9| – | – | 137 844.3| – ^ 162 943.8^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]| 405.3| 36.6| 24.4| – | – | – | – | – | – ^ 466.3^ | ^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]| 405.3| 36.6| 24.4| – | – | – | – | – | – ^ 466.3^ |
| ^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]| 6 890.1| 28.9| – | 17 851.3| – | 12 187.9| – | 141 559.0| 8.4^ 178 525.6^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]| 6 890.1| 28.9| – | 17 851.3| – | 12 187.9| – | 141 559.0| 8.4^ 178 525.6^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]| – | – | – | – | – | – | – | 23.5| – ^ 23.5^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]| – | – | – | – | – | – | – | 23.5| – ^ 23.5^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]| – | – | – | – | – | – | – | 37 824.9| – ^ 37 824.9^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]| – | – | – | – | – | – | – | 37 824.9| – ^ 37 824.9^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]| – | – | – | – | – | – | – | 7 374.2| – ^ 7 374.2^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]| – | – | – | – | – | – | – | 7 374.2| – ^ 7 374.2^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]| 17 435.8| 50.6| 647.8| 23 892.0| 685.2| 15 511.4| 2 750.7| 163 859.9| 1 391.5^ 226 224.9^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]| 17 435.8| 50.6| 647.8| 23 892.0| 685.2| 15 511.4| 2 750.7| 163 859.9| 1 391.5^ 226 224.9^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]| 3 766.7| 64.9| 163.1| – | – | – | – | 12 141.5| 2.5^ 16 138.6^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]| 3 766.7| 64.9| 163.1| – | – | – | – | 12 141.5| 2.5^ 16 138.6^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]| – | – | – | – | – | – | – | 871.1| – ^ 871.1^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]| – | – | – | – | – | – | – | 871.1| – ^ 871.1^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]| – | – | – | – | – | – | – | 13.9| – ^ 13.9^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]| – | – | – | – | – | – | – | 13.9| – ^ 13.9^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]| – | – | – | – | – | – | – | 5 964.3| – ^ 5 964.3^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]| – | – | – | – | – | – | – | 5 964.3| – ^ 5 964.3^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]| 669.1| 7.2| 17.4| 17 175.1| 471.2| 11 198.5| – | 5 247.7| – ^ 34 786.3^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]| 669.1| 7.2| 17.4| 17 175.1| 471.2| 11 198.5| – | 5 247.7| – ^ 34 786.3^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]| 3 207.6| 362.1| 66.9| 17 519.4| 536.7| 11 682.0| – | 2 050.4| – ^ 35 425.1^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]| 3 207.6| 362.1| 66.9| 17 519.4| 536.7| 11 682.0| – | 2 050.4| – ^ 35 425.1^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]| 8 794.5| 86.5| – | – | – | – | – | 15 112.0| – ^ 23 993.1^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]| 8 794.5| 86.5| – | – | – | – | – | 15 112.0| – ^ 23 993.1^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]| – | – | – | – | – | – | – | 1 258.4| – ^ 1 258.4^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]| – | – | – | – | – | – | – | 1 258.4| – ^ 1 258.4^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]| – | – | – | – | – | – | – | 7 828.0| – ^ 7 828.0^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]| – | – | – | – | – | – | – | 7 828.0| – ^ 7 828.0^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]| – | – | – | 13 805.0| – | – | – | – | – ^ 13 805.0^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]| – | – | – | 13 805.0| – | – | – | – | – ^ 13 805.0^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]| 17 229.8| 356.4| 1 193.5| 23 401.1| 716.8| 15 555.9| 2 952.8| 170 892.9| 812.1^ 233 111.3^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]| 17 229.8| 356.4| 1 193.5| 23 401.1| 716.8| 15 555.9| 2 952.8| 170 892.9| 812.1^ 233 111.3^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]| 7 690.7| 138.1| 392.0| – | 723.9| – | – | 39 805.6| – ^ 48 750.2^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]| 7 690.7| 138.1| 392.0| – | 723.9| – | – | 39 805.6| – ^ 48 750.2^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]| 27 056.2| 283.2| 754.2| 19 482.9| 576.1| 12 662.8| 2 367.5| 164 059.8| – ^ 227 242.6^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]| 27 056.2| 283.2| 754.2| 19 482.9| 576.1| 12 662.8| 2 367.5| 164 059.8| – ^ 227 242.6^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]| 7 204.0| 81.3| – | 24 385.0| 706.2| 15 188.4| 2 782.5| 229 480.2| 738.5^ 280 566.2^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 7 204.0| 81.3| – | 24 385.0| 706.2| 15 188.4| 2 782.5| 229 480.2| 738.5^ 280 566.2^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 8.4| 5.2| – | – | – | – | – | – | – ^ 13.6^ | ^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 8.4| 5.2| – | – | – | – | – | – | – ^ 13.6^ |
| ^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 4 132.6| 64.1| – | 8 043.5| – | 9 426.4| 2 725.2| 211 310.4| – ^ 235 702.3^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 4 132.6| 64.1| – | 8 043.5| – | 9 426.4| 2 725.2| 211 310.4| – ^ 235 702.3^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 11 757.6| 143.8| 518.7| – | 565.5| – | – | 104 831.9| 4 312.8^ 122 130.4^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ru|ru]]| 11 757.6| 143.8| 518.7| – | 565.5| – | – | 104 831.9| 4 312.8^ 122 130.4^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| – | – | – | – | – | – | – | 2 313.4| – ^ 2 313.4^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| – | – | – | – | – | – | – | 2 313.4| – ^ 2 313.4^ |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sk|sk]]| 7 626.6| 402.2| 558.0| 18 398.8| 560.8| 12 727.0| – | 34 589.4| – ^ 74 862.7^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sk|sk]]| 7 626.6| 402.2| 558.0| 18 398.8| 560.8| 12 727.0| – | 34 589.4| – ^ 74 862.7^ |
| |
| ===== References – about UD-annotated InterCorp ===== | ===== References – about UD-annotated InterCorp ===== |
| | |
| | Rosen, A. (2024): Lexical and syntactic variability |
| | of languages and text genres – a corpus-based study. [[https://www.youtube.com/watch?v=E2ujmqt7Q2E|Recording]] from 14 October 2024: [[https://zil.ipipan.waw.pl/|Natural Language Processing Seminar]] organised by the [[https://zil.ipipan.waw.pl|Linguistic Engineering Group]] at the [[https://ipipan.waw.pl|Institute of Computer Science]] [[https://pan.pl|Polish Academy of Sciences]], [[https://zil.ipipan.waw.pl/seminarium-archiwum?action=AttachFile&do=view&target=2024-10-14.pdf|slides]]. |
| |
| Olga Nádvorníková (2024): Analyse contrastive de la complexité syntaxique à l’aide de corpus parallèles. Translitteræ, Laboratoire LATTICE (Langues, Textes, Traitements informatiques et Cognition) – CNRS UMR 8094 (Centre national de la recherche scientifique: Unité mixte de recherche), ENS (L'École normale supérieure). Paris, 28/05/2024. [[https://www.youtube.com/watch?v=wJrCez_XPQY|Video]], [[https://jakobson.korpus.cz/~rosen/INTERCORP/SLIDES/C4%20Nadvornikova%20Analyse%20contrastiv%20e%20de%20la%20complexité%20syntaxique.pdf|slides]] | Olga Nádvorníková (2024): Analyse contrastive de la complexité syntaxique à l’aide de corpus parallèles. Translitteræ, Laboratoire LATTICE (Langues, Textes, Traitements informatiques et Cognition) – CNRS UMR 8094 (Centre national de la recherche scientifique: Unité mixte de recherche), ENS (L'École normale supérieure). Paris, 28/05/2024. [[https://www.youtube.com/watch?v=wJrCez_XPQY|Video]], [[https://jakobson.korpus.cz/~rosen/INTERCORP/SLIDES/C4%20Nadvornikova%20Analyse%20contrastiv%20e%20de%20la%20complexité%20syntaxique.pdf|slides]] |