Obě strany předchozí revizePředchozí verzeNásledující verze | Předchozí verze |
cnk:intercorp:verze16ud [2024/10/01 10:20] – [To nejdůležitější o verzi 16ud] alexandrrosen | cnk:intercorp:verze16ud [2024/10/18 20:33] (aktuální) – [Odkazy – o korpusu InterCorp s anotací podle UD] alexandrrosen |
---|
===== To nejdůležitější o verzi 16ud ===== | ===== To nejdůležitější o verzi 16ud ===== |
| |
* Podrobný popis využití anotace podle UD v korpusu InterCorp najdete pod heslem [[pojmy:ud|Universal Dependencies]] ve [[pojmy:prehled_pojmu|Slovníčku pojmů ČNK]]. | * Podrobný popis využití anotace podle UD v korpusu InterCorp najdete pod heslem <fs large>**[[pojmy:ud|Universal Dependencies]]**</fs> ve [[pojmy:prehled_pojmu|Slovníčku pojmů ČNK]]. |
* Po verzi 13ud, 16ud je druhá verze InterCorpu s lingvistickou anotací podle standardu [[pojmy:ud|Universal Dependencies]]. | * Po verzi 13ud, 16ud je druhá verze InterCorpu s lingvistickou anotací podle standardu [[pojmy:ud|Universal Dependencies]]. |
* Verze 16ud je prvním korpusem ČNK, který obsahuje metriky <fs large>**[[pojmy:syntakticka_komplexita|syntaktické komplexity]]**</fs> a <fs large>**[[https://wiki.korpus.cz/doku.php/cs:pojmy:lexikalni_bohatost#lexikalni_diverzita|lexikální diverzity]]**</fs>.((Děkujeme Olze Nádvorníkové, která obohacení korpusu o míry syntaktické komplexity a lexikální diverzity iniciovala a vedla. Děkujeme také Jiřímu Miličkovi za cenné rady při výběru vhodných měr lexikální diverzity.)) | * Verze 16ud je prvním korpusem ČNK, který obsahuje metriky <fs large>**[[pojmy:syntakticka_komplexita|syntaktické komplexity]]**</fs> a <fs large>**[[https://wiki.korpus.cz/doku.php/pojmy:lexikalni_bohatost#lexikalni_diverzita|lexikální diverzity]]**</fs>.((Děkujeme Olze Nádvorníkové, která rozšíření anotace korpusu o míry syntaktické komplexity a lexikální diverzity iniciovala a vedla. Děkujeme také Jiřímu Miličkovi za cenné rady při výběru vhodných měr lexikální diverzity.)) |
* Z celkového počtu 62 jazyků (včetně češtiny) je ve verzi 16ud **lingvisticky anotovaných 47**; všechny takové jazyky jsou navíc vybaveny i **syntaktickou anotací**. | * Z celkového počtu 62 jazyků (včetně češtiny) je ve verzi 16ud **lingvisticky anotovaných 47**; všechny takové jazyky jsou navíc vybaveny i **syntaktickou anotací**. |
* Texty jsou ve všech jazycích **anotované stejně**, a to podle standardu UD ([[https://universaldependencies.org|Universal Dependencies]]). | * Texty jsou ve všech jazycích **anotované stejně**, a to podle standardu UD ([[https://universaldependencies.org|Universal Dependencies]]). |
^ **CELKEM** ^ | 6 455 | 1 668 | | ^ **CELKEM** ^ | 6 455 | 1 668 | |
| |
V níže uvedených tabulkách je jádro InterCorpu rozděleno podle typu textu na beletrii (**Core-fiction**), literaturu faktu (**Core-nonfiction**) a různé (**Core-misc**), kategorii zahrnující divadelní hry, poezii a dětskou literaturu). | V níže uvedených tabulkách je jádro InterCorpu rozděleno podle typu textu na beletrii (**Core-fiction**), literaturu faktu (**Core-nonfiction**) a různé (**Core-misc**), kategorii zahrnující divadelní hry, poezii a dětskou literaturu. |
| |
==== Velikost korpusu podle kolekcí ==== | ==== Velikost korpusu podle kolekcí ==== |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]| 1| 6 556| 6 594,8| 32 635,9| 38 097,3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]| 1| 6 556| 6 594,8| 32 635,9| 38 097,3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 117| 116 660| 25 976,1| 123 357,7| 165 696,1| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 117| 116 660| 25 976,1| 123 357,7| 165 696,1| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 310| 138 571| 33 957,7| 258 555,1| 315 325,2| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]| 310| 138 571| 33 957,7| 258 555,1| 315 325,2| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]| 1| 146| 121,7| 622,1| 797,9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]| 1| 146| 121,7| 622,1| 797,9| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]| 1| 33 935| 27 608,8| 129 458,6| 172 973,7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]| 1| 33 935| 27 608,8| 129 458,6| 172 973,7| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]| 8| 61| 116,6| 832,7| 988,1| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]| 8| 61| 116,6| 832,7| 988,1| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]| 327| 35 447| 30 758,6| 162 943,8| 208 413,5| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]| 327| 35 447| 30 758,6| 162 943,8| 208 413,5| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]| 13| 13| 41,6| 466,3| 586,3| | ^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]| 13| 13| 41,6| 466,3| 586,3| |
^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]| 95| 125 933| 34 510,0| 178 525,6| 240 411,9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]| 95| 125 933| 34 510,0| 178 525,6| 240 411,9| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]| 1| 7| 3,9| 23,5| 30,6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]| 1| 7| 3,9| 23,5| 30,6| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]| 1| 8 350| 8 112,7| 37 824,9| 49 694,7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]| 1| 8 350| 8 112,7| 37 824,9| 49 694,7| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]| 1| 1 135| 1 497,9| 7 374,2| 9 299,9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]| 1| 1 135| 1 497,9| 7 374,2| 9 299,9| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]| 194| 134 401| 33 361,2| 226 224,9| 286 343,4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]| 194| 134 401| 33 361,2| 226 224,9| 286 343,4| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]| 37| 2 363| 2 296,7| 16 138,6| 18 020,3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]| 37| 2 363| 2 296,7| 16 138,6| 18 020,3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]| 1| 204| 198,4| 871,1| 1 179,0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]| 1| 204| 198,4| 871,1| 1 179,0| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]| 1| 4| 4,1| 13,9| 19,2| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]| 1| 4| 4,1| 13,9| 19,2| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]| 1| 1 605| 1 641,1| 5 964,3| 7 294,3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]| 1| 1 605| 1 641,1| 5 964,3| 7 294,3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]| 28| 87 642| 3 622,1| 34 786,3| 45 134,4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]| 28| 87 642| 3 622,1| 34 786,3| 45 134,4| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]| 78| 86 356| 3 023,6| 35 425,1| 45 293,5| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]| 78| 86 356| 3 023,6| 35 425,1| 45 293,5| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]| 109| 3 541| 3 907,8| 23 993,1| 30 898,6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]| 109| 3 541| 3 907,8| 23 993,1| 30 898,6| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]| 1| 285| 365,3| 1 258,4| 1 793,5| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]| 1| 285| 365,3| 1 258,4| 1 793,5| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]| 1| 1 496| 1 712,1| 7 828,0| 10 573,3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]| 1| 1 496| 1 712,1| 7 828,0| 10 573,3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]| 1| 8 963| 784,8| 13 805,0| 16 643,6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]| 1| 8 963| 784,8| 13 805,0| 16 643,6| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]| 232| 132 791| 33 065,4| 233 111,3| 284 402,6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]| 232| 132 791| 33 065,4| 233 111,3| 284 402,6| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]| 105| 9 163| 8 344,6| 48 750,2| 61 120,3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]| 105| 9 163| 8 344,6| 48 750,2| 61 120,3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]| 360| 140 055| 41 282,4| 227 242,6| 300 207,8| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|nl]]| 360| 140 055| 41 282,4| 227 242,6| 300 207,8| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]| 107| 147 063| 46 510,1| 280 566,2| 355 121,8| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 107| 147 063| 46 510,1| 280 566,2| 355 121,8| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 2| 2| 1,7| 13,6| 17,7| | ^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 2| 2| 1,7| 13,6| 17,7| |
^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 55| 102 904| 39 561,2| 235 702,3| 295 301,3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ru]]| 55| 102 904| 39 561,2| 235 702,3| 295 301,3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 184| 32 839| 22 985,2| 122 130,4| 163 120,7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 184| 32 839| 22 985,2| 122 130,4| 163 120,7| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| 1| 499| 522,5| 2 313,4| 3 021,8| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| 1| 499| 522,5| 2 313,4| 3 021,8| |
==== Velikost korpusu v tisících slov podle jazyků a kolekcí ==== | ==== Velikost korpusu v tisících slov podle jazyků a kolekcí ==== |
| |
^ [[https://en.wikipedia.org/wiki/ISO_639-1|Lang]] ^ Core-fiction ^ Core-misc ^ Core-nonfiction ^ Acquis ^ Bible ^ Europarl ^ PressEurop ^ Subtitles ^ Syndicate ^ CELKEM ^ | ^ [[https://en.wikipedia.org/wiki/ISO_639-1|Jazyk]] ^ Core-fiction ^ Core-misc ^ Core-nonfiction ^ Acquis ^ Bible ^ Europarl ^ PressEurop ^ Subtitles ^ Syndicate ^ CELKEM ^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=af|af]]| – | – | – | – | – | – | – | 134,6| – ^ 134,6^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=af|af]]| – | – | – | – | – | – | – | 134,6| – ^ 134,6^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ar|ar]]| 28,8| 5,5| – | – | – | – | – | 126 195,5| 384,5^ 126 614,3^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ar|ar]]| 28,8| 5,5| – | – | – | – | – | 126 195,5| 384,5^ 126 614,3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]| – | – | – | – | – | – | – | 32 635,9| – ^ 32 635,9^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]| – | – | – | – | – | – | – | 32 635,9| – ^ 32 635,9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 6 714,9| 44,4| 200,5| 15 264,2| 542,6| 10 109,3| – | 90 481,8| – ^ 123 357,7^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 6 714,9| 44,4| 200,5| 15 264,2| 542,6| 10 109,3| – | 90 481,8| – ^ 123 357,7^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 20 454,4| 194,3| 3 687,5| 26 298,4| 762,6| 17 186,4| 3 044,3| 181 033,4| 5 893,7^ 258 555,1^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]| 20 454,4| 194,3| 3 687,5| 26 298,4| 762,6| 17 186,4| 3 044,3| 181 033,4| 5 893,7^ 258 555,1^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]| – | – | – | – | – | – | – | 622,1| – ^ 622,1^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]| – | – | – | – | – | – | – | 622,1| – ^ 622,1^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]| – | – | – | – | – | – | – | 129 458,6| – ^ 129 458,6^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]| – | – | – | – | – | – | – | 129 458,6| – ^ 129 458,6^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]| 402,8| – | – | – | – | – | – | 429,9| – ^ 832,7^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]| 402,8| – | – | – | – | – | – | 429,9| – ^ 832,7^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]| 22 763,6| 242,6| 1 523,4| – | 569,9| – | – | 137 844,3| – ^ 162 943,8^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]| 22 763,6| 242,6| 1 523,4| – | 569,9| – | – | 137 844,3| – ^ 162 943,8^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]| 405,3| 36,6| 24,4| – | – | – | – | – | – ^ 466,3^ | ^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]| 405,3| 36,6| 24,4| – | – | – | – | – | – ^ 466,3^ |
^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]| 6 890,1| 28,9| – | 17 851,3| – | 12 187,9| – | 141 559,0| 8,4^ 178 525,6^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]| 6 890,1| 28,9| – | 17 851,3| – | 12 187,9| – | 141 559,0| 8,4^ 178 525,6^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]| – | – | – | – | – | – | – | 23,5| – ^ 23,5^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hy]]| – | – | – | – | – | – | – | 23,5| – ^ 23,5^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]| – | – | – | – | – | – | – | 37 824,9| – ^ 37 824,9^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]| – | – | – | – | – | – | – | 37 824,9| – ^ 37 824,9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]| – | – | – | – | – | – | – | 7 374,2| – ^ 7 374,2^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]| – | – | – | – | – | – | – | 7 374,2| – ^ 7 374,2^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]| 17 435,8| 50,6| 647,8| 23 892,0| 685,2| 15 511,4| 2 750,7| 163 859,9| 1 391,5^ 226 224,9^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]| 17 435,8| 50,6| 647,8| 23 892,0| 685,2| 15 511,4| 2 750,7| 163 859,9| 1 391,5^ 226 224,9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]| 3 766,7| 64,9| 163,1| – | – | – | – | 12 141,5| 2,5^ 16 138,6^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]| 3 766,7| 64,9| 163,1| – | – | – | – | 12 141,5| 2,5^ 16 138,6^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]| – | – | – | – | – | – | – | 871,1| – ^ 871,1^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]| – | – | – | – | – | – | – | 871,1| – ^ 871,1^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]| – | – | – | – | – | – | – | 13,9| – ^ 13,9^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]| – | – | – | – | – | – | – | 13,9| – ^ 13,9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]| – | – | – | – | – | – | – | 5 964,3| – ^ 5 964,3^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]| – | – | – | – | – | – | – | 5 964,3| – ^ 5 964,3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]| 669,1| 7,2| 17,4| 17 175,1| 471,2| 11 198,5| – | 5 247,7| – ^ 34 786,3^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]| 669,1| 7,2| 17,4| 17 175,1| 471,2| 11 198,5| – | 5 247,7| – ^ 34 786,3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]| 3 207,6| 362,1| 66,9| 17 519,4| 536,7| 11 682,0| – | 2 050,4| – ^ 35 425,1^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]| 3 207,6| 362,1| 66,9| 17 519,4| 536,7| 11 682,0| – | 2 050,4| – ^ 35 425,1^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]| 8 794,5| 86,5| – | – | – | – | – | 15 112,0| – ^ 23 993,1^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]| 8 794,5| 86,5| – | – | – | – | – | 15 112,0| – ^ 23 993,1^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]| – | – | – | – | – | – | – | 1 258,4| – ^ 1 258,4^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]| – | – | – | – | – | – | – | 1 258,4| – ^ 1 258,4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]| – | – | – | – | – | – | – | 7 828,0| – ^ 7 828,0^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]| – | – | – | – | – | – | – | 7 828,0| – ^ 7 828,0^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]| – | – | – | 13 805,0| – | – | – | – | – ^ 13 805,0^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]| – | – | – | 13 805,0| – | – | – | – | – ^ 13 805,0^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]| 17 229,8| 356,4| 1 193,5| 23 401,1| 716,8| 15 555,9| 2 952,8| 170 892,9| 812,1^ 233 111,3^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]| 17 229,8| 356,4| 1 193,5| 23 401,1| 716,8| 15 555,9| 2 952,8| 170 892,9| 812,1^ 233 111,3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]| 7 690,7| 138,1| 392,0| – | 723,9| – | – | 39 805,6| – ^ 48 750,2^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]| 7 690,7| 138,1| 392,0| – | 723,9| – | – | 39 805,6| – ^ 48 750,2^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]| 27 056,2| 283,2| 754,2| 19 482,9| 576,1| 12 662,8| 2 367,5| 164 059,8| – ^ 227 242,6^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]| 27 056,2| 283,2| 754,2| 19 482,9| 576,1| 12 662,8| 2 367,5| 164 059,8| – ^ 227 242,6^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]| 7 204,0| 81,3| – | 24 385,0| 706,2| 15 188,4| 2 782,5| 229 480,2| 738,5^ 280 566,2^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 7 204,0| 81,3| – | 24 385,0| 706,2| 15 188,4| 2 782,5| 229 480,2| 738,5^ 280 566,2^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 8,4| 5,2| – | – | – | – | – | – | – ^ 13,6^ | ^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 8,4| 5,2| – | – | – | – | – | – | – ^ 13,6^ |
^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 4 132,6| 64,1| – | 8 043,5| – | 9 426,4| 2 725,2| 211 310,4| – ^ 235 702,3^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 4 132,6| 64,1| – | 8 043,5| – | 9 426,4| 2 725,2| 211 310,4| – ^ 235 702,3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 11 757,6| 143,8| 518,7| – | 565,5| – | – | 104 831,9| 4 312,8^ 122 130,4^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ru|ru]]| 11 757,6| 143,8| 518,7| – | 565,5| – | – | 104 831,9| 4 312,8^ 122 130,4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| – | – | – | – | – | – | – | 2 313,4| – ^ 2 313,4^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| – | – | – | – | – | – | – | 2 313,4| – ^ 2 313,4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sk|sk]]| 7 626,6| 402,2| 558,0| 18 398,8| 560,8| 12 727,0| – | 34 589,4| – ^ 74 862,7^ | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sk|sk]]| 7 626,6| 402,2| 558,0| 18 398,8| 560,8| 12 727,0| – | 34 589,4| – ^ 74 862,7^ |
^:::|Core-misc| 2| 2| 3,5| 44,4| 52,2| 733,0| 532,9| 12,820| 2,148| 1,051| 4,791| 1,821| 2,385| | ^:::|Core-misc| 2| 2| 3,5| 44,4| 52,2| 733,0| 532,9| 12,820| 2,148| 1,051| 4,791| 1,821| 2,385| |
^:::|Acquis| 1| 18 563| 1 310,5| 15 264,2| 19 702,1| 556,9| 380,4| 13,209| 2,369| 0,886| 6,990| 2,588| 2,647| | ^:::|Acquis| 1| 18 563| 1 310,5| 15 264,2| 19 702,1| 556,9| 380,4| 13,209| 2,369| 0,886| 6,990| 2,588| 2,647| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]|Bible| 2| 66| 48,0| 542,6| 675,3| 529,0| 351,4| 13,324| 1,911| 0,871| 4,231| 1,534| 2,511| | ^:::|Bible| 2| 66| 48,0| 542,6| 675,3| 529,0| 351,4| 13,324| 1,911| 0,871| 4,231| 1,534| 2,511| |
^:::|Europarl| 1| 67 019| 675,6| 10 109,3| 11 838,6| 670,8| 462,7| 15,260| 2,483| 1,242| 6,924| 2,670| 2,395| | ^:::|Europarl| 1| 67 019| 675,6| 10 109,3| 11 838,6| 670,8| 462,7| 15,260| 2,483| 1,242| 6,924| 2,670| 2,395| |
^:::|Subtitles| 1| 30 900| 23 262,2| 90 481,8| 124 969,7| 666,5| 444,7| 3,909| 1,244| 0,242| 1,404| 0,513| 1,689| | ^:::|Subtitles| 1| 30 900| 23 262,2| 90 481,8| 124 969,7| 666,5| 444,7| 3,909| 1,244| 0,242| 1,404| 0,513| 1,689| |
^:::|PressEurop| 7| 6 991| 160,6| 2 725,2| 3 192,6| 546,7| 429,5| 17,486| 2,219| 1,017| 8,508| 2,772| 2,492| | ^:::|PressEurop| 7| 6 991| 160,6| 2 725,2| 3 192,6| 546,7| 429,5| 17,486| 2,219| 1,017| 8,508| 2,772| 2,492| |
^:::|Subtitles| 1| 45 407| 38 108,1| 211 310,4| 266 731,5| 509,0| 351,2| 5,572| 1,388| 0,383| 2,129| 0,795| 1,954| | ^:::|Subtitles| 1| 45 407| 38 108,1| 211 310,4| 266 731,5| 509,0| 351,2| 5,572| 1,388| 0,383| 2,129| 0,795| 1,954| |
^:::|Core-nonfict| 10| 10| 30,6| 518,7| 625,2| 645,0| 495,9| 17,765| 2,613| 1,223| 8,126| 2,801| 2,603| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ru|ru]]|Core-nonfict| 10| 10| 30,6| 518,7| 625,2| 645,0| 495,9| 17,765| 2,613| 1,223| 8,126| 2,801| 2,603| |
^:::|Core-fiction| 144| 144| 1 043,5| 11 757,6| 14 913,7| 633,0| 501,9| 11,643| 1,959| 0,865| 4,203| 1,557| 2,386| | ^:::|Core-fiction| 144| 144| 1 043,5| 11 757,6| 14 913,7| 633,0| 501,9| 11,643| 1,959| 0,865| 4,203| 1,557| 2,386| |
^:::|Core-misc| 6| 6| 12,8| 143,8| 180,7| 633,2| 484,5| 11,439| 1,947| 0,870| 4,378| 1,718| 2,265| | ^:::|Core-misc| 6| 6| 12,8| 143,8| 180,7| 633,2| 484,5| 11,439| 1,947| 0,870| 4,378| 1,718| 2,265| |
| |
* [[http://ufal.mff.cuni.cz/udpipe|UDPipe]] (s poděkováním Janě a Milanovi Strakovým, Danu Zemanovi a Martinu Popelovi) | * [[http://ufal.mff.cuni.cz/udpipe|UDPipe]] (s poděkováním Janě a Milanovi Strakovým, Danu Zemanovi a Martinu Popelovi) |
| |
| ===== Odkazy – o korpusu InterCorp s anotací podle UD ===== |
| |
| Olga Nádvorníková a Alexandr Rosen (2024): Vyhledávání v paralelním korpusu za použití anotace Universal Dependencies. [[https://www.youtube.com/watch?v=5l5Vbb1eQDw&t=190s|Záznam workshopu]] z 17. 9. 2024, doprovodné akce [[https://bcl2024.ff.cuni.cz|Bienále české lingvistiky 2024]], viz též [[https://jakobson.korpus.cz/~rosen/BCL2024/P18_SLIDES/Prezentace_Bienale2024_WorkShop.pdf|prezentace]]. |
| |
| Alexandr Rosen (2024): Lexical and syntactic variability |
| of languages and text genres – a corpus-based study. [[https://www.youtube.com/watch?v=E2ujmqt7Q2E|Záznam přednášky]] ze 14. 10. 2024, [[https://zil.ipipan.waw.pl/seminarium|Seminarium „Przetwarzanie języka naturalnego”]] [[https://zil.ipipan.waw.pl|Zespołu Inżynierii Lingwistycznej]] w [[https://ipipan.waw.pl|Instytucie Podstaw Informatyki]] [[https://pan.pl|Polskiej Akademii Nauk]], viz též [[https://zil.ipipan.waw.pl/seminarium-archiwum?action=AttachFile&do=view&target=2024-10-14.pdf|prezentace]]. |
| |
| Alexandr Rosen (2024): Exploring InterCorp v16ud: the potential of a multilingual parallel treebank with complexity and diversity metrics. Instytut Slawistyki Zachodniej i Południowej, Uniwersytet Warszawski. Warszawa, 10/06/2024. [[https://jakobson.korpus.cz/~rosen/INTERCORP/SLIDES/2024_UDCM_Wwa.pdf|Prezentace]] |
| |
| Olga Nádvorníková (2024): Analyse contrastive de la complexité syntaxique à l’aide de corpus parallèles. Translitteræ, Laboratoire LATTICE (Langues, Textes, Traitements informatiques et Cognition) – CNRS UMR 8094 (Centre national de la recherche scientifique: Unité mixte de recherche), ENS (L'École normale supérieure). Paris, 28/05/2024. [[https://www.youtube.com/watch?v=wJrCez_XPQY|Záznam přednášky]], [[https://jakobson.korpus.cz/~rosen/INTERCORP/SLIDES/C4%20Nadvornikova%20Analyse%20contrastiv%20e%20de%20la%20complexité%20syntaxique.pdf|prezentace]]. |
| |
| Olga Nádvorníková, Alexandr Rosen, Martin Stluka (2024): InterCorp a Universal Dependencies: nové možnosti výzkumu. Teoreticko-metodologický seminář Ústavu českého jazyka a teorie komunikace FF UK. Praha, 20/03/2024, 27/03/2024. [[https://docs.google.com/document/d/1nSPzyhT6oHKUDN8A_uYmWrZH6tAmxTH_pUMOdjg01Eg/edit?usp=sharing|Program workshopu s odkazy na prezentace a záznamy]] |
| |
| Alexandr Rosen (2023). The InterCorp parallel corpus with a uniform annotation for all languages. Jazykovedný časopis, 74(1):254–265. [[https://www.juls.savba.sk/ediela/jc/2023/1/jc23-01.pdf|Článek]], [[https://jakobson.korpus.cz/~rosen/INTERCORP/SLIDES/rosen-slovko-2023.pdf|prezentace]] |
| |
| Olga Nádvorníková, Alexandr Rosen, Martin Vavřín (2021): InterCorp s jednotnou morfologickou a syntaktickou anotací podle Universal Dependencies: zážitky tvůrců a uživatelů. Praha, 16/11/2021. |
| [[https://owncloud.korpus.cz/s/n3XSpYPpcMjbdC6|Záznam přednášky]], prezentace: [[https://owncloud.korpus.cz/s/aioW5oXt8Yo7tKp|zážitky tvůrců]], [[https://owncloud.korpus.cz/s/8ALLEPbZnqbLodY|zážitky uživatelů]] |
| |
===== Jak citovat ===== | ===== Jak citovat ===== |