Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:cnk:intercorp:verze16ud [2024/09/24 00:12] – [Corpus size in thousands of words by language and collection] alexandrrosen | en:cnk:intercorp:verze16ud [2024/09/24 09:14] (current) – [Number of texts in the Core] alexandrrosen |
---|
InterCorp release 16ud contains the **same texts** as InterCorp release 16. They **differ only in linguistic annotation**. However, the token and word count data in 16ud may differ slightly due to a different tokenization method. | InterCorp release 16ud contains the **same texts** as InterCorp release 16. They **differ only in linguistic annotation**. However, the token and word count data in 16ud may differ slightly due to a different tokenization method. |
| |
The **core** of InterCorp consists of fiction, some non-fiction and a marginal share of other text types such as drama or poetry. The alignment of texts in the core is manually chacked. The other texts, grouped in **collections**, are aligned automatically without human intervention. The choice in the present release includes: | The **core** of InterCorp consists of fiction, some non-fiction and a marginal share of other text types such as drama or poetry. The alignment of texts in the core is manually checked. The other texts, grouped in **collections**, are aligned automatically without human intervention. The choice in the present release includes: |
| |
* Political commentaries published by [[http://www.project-syndicate.org/|Project Syndicate]] and [[http://www.voxeurop.eu|VoxEurop]] (formerly PressEurop) | * Political commentaries published by [[http://www.project-syndicate.org/|Project Syndicate]] (below referred to as **Syndicate**) and [[http://www.voxeurop.eu|VoxEurop]] (formerly **PressEurop**) |
* A package of legal texts of the European Union form the [[https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis|Acquis Communautaire]] corpus | * A package of legal texts of the European Union form the [[https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis|Acquis Communautaire]] corpus (**Acquis**) |
* Proceedings of the European Parliament dated 2007–2011 from the [[http://www.statmt.org/europarl/|Europarl]] corpus | * Proceedings of the European Parliament dated 2007–2011 from the [[http://www.statmt.org/europarl/|Europarl]] corpus (**Europarl**) |
* Film subtitles from the [[http://www.opensubtitles.org/|Open Subtitles]] database | * Film subtitles from the [[http://www.opensubtitles.org/|Open Subtitles]] database (**Subtitles**) |
* Translations of the Bible | * Translations of the **Bible** |
| |
In texts aligned automatically without manual checking the search results may include a higher number of misaligned segments. Also, some collections do not retain all texts from the original resource. This includes texts that have no Czech counterpart. Some texts from the //Acquis Communautaire// and //Europarl// corpora have been partially corrected or omitted – as a result, they may differ in form or size if compared with the original source. A similar selection was applied to the //Open Subtitles// database, where – as an additional reduction – only a single translation was selected per title and language. On the other hand, some metadata items missing in the original resource but detectable from context or other sources have been added. | In texts aligned automatically without manual checking the search results may include a higher number of misaligned segments. Also, some collections do not retain all texts from the original resource. This includes texts that have no Czech counterpart. Some texts from the //Acquis Communautaire// and //Europarl// corpora have been partially corrected or omitted – as a result, they may differ in form or size if compared with the original source. A similar selection was applied to the //Open Subtitles// database, where – as an additional reduction – only a single translation was selected per title and language. On the other hand, some metadata items missing in the original resource but detectable from context or other sources have been added. |
===== The corpus in numbers ===== | ===== The corpus in numbers ===== |
| |
In the tables below, the Core part of the corpus is split according to the text type into fiction, non-fiction, and "misc" (for "miscellaneous", such as drama, poetry or children's literature). | ==== Number of texts in the Core ==== |
| |
| ^ Language ^^ Number of texts ^ including originals ^ |
| ^ ar ^ Arabic | 3 | 1 | |
| ^ be ^ Belarusian | 108 | 14 | |
| ^ bg ^ Bulgarian | 87 | 19 | |
| ^ ca ^ Catalan | 92 | 1 | |
| ^ cs ^ Czech | 1 812 | 368 | |
| ^ da ^ Danish | 93 | 9 | |
| ^ de ^ German | 471 | 163 | |
| ^ en ^ English | 422 | 271 | |
| ^ es ^ Spanish | 355 | 142 | |
| ^ et ^ Estonian | 1 | 0 | |
| ^ fi ^ Finnish | 112 | 36 | |
| ^ fr ^ French | 277 | 126 | |
| ^ hi ^ Hindi | 7 | 2 | |
| ^ hr ^ Croatian | 324 | 37 | |
| ^ hs ^ Upper Sorbian | 13 | 5 | |
| ^ hu ^ Hungarian | 89 | 1 | |
| ^ it ^ Italian | 171 | 26 | |
| ^ ja ^ Japanese | 35 | 15 | |
| ^ lt ^ Lithuanian | 23 | 4 | |
| ^ lv ^ Latvian | 73 | 15 | |
| ^ mk ^ Macedonian | 108 | 4 | |
| ^ nl ^ Dutch | 215 | 52 | |
| ^ no ^ Norwegian | 102 | 23 | |
| ^ pl ^ Polish | 348 | 54 | |
| ^ pt ^ Portuguese | 87 | 24 | |
| ^ rn ^ Romani | 2 | 2 | |
| ^ ro ^ Romanian | 45 | 5 | |
| ^ ru ^ Russian | 160 | 37 | |
| ^ sk ^ Slovak | 165 | 62 | |
| ^ sl ^ Slovene | 73 | 25 | |
| ^ sr ^ Serbian | 148 | 8 | |
| ^ sv ^ Swedish | 232 | 101 | |
| ^ uk ^ Ukrainian | 199 | 8 | |
| ^ zh ^ Chinese | 3 | 3 | |
| ^ **TOTAL** ^ | 6 495 | 1 668 | |
| |
| |
| In the tables below, the Core part of the corpus is split according to the text type into fiction (**Core-fiction**), non-fiction (**Core-nonfiction**), and miscellaneous (**Core-misc**), including drama, poetry or children's literature). |
| |
==== Corpus size by collection ==== | ==== Corpus size by collection ==== |
^Subtitles| 58| 965 557| 793 931| 3 970 273| 5 162 184| | ^Subtitles| 58| 965 557| 793 931| 3 970 273| 5 162 184| |
^Syndicate| 162| 39 158| 1 697| 35 385| 40 423| | ^Syndicate| 162| 39 158| 1 697| 35 385| 40 423| |
^TOTAL| 6 826| 2 831 743| 880 152| 5 256 601| 6 711 091| | ^TOTAL^ 6 826^ 2 831 743^ 880 152^ 5 256 601^ 6 711 091^ |
| |
| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=vi|vi]]| 1| 3 468| 3 304.5| 19 281.4| 23 984.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=vi|vi]]| 1| 3 468| 3 304.5| 19 281.4| 23 984.0| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=zh|zh]]| 9| 12 035| 11 993.7| 71 855.3| 80 560.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=zh|zh]]| 9| 12 035| 11 993.7| 71 855.3| 80 560.0| |
^TOTAL| 6 826| 2 831 743| 880 152.2| 5 256 601.0| 6 711 091.0| | ^TOTAL^ 6 826^ 2 831 743^ 880 152.2^ 5 256 601.0^ 6 711 091.0^ |
| |
==== Corpus size in thousands of words by language and collection ==== | ==== Corpus size in thousands of words by language and collection ==== |
| |
^ [[https://en.wikipedia.org/wiki/ISO_639-1|Lang]] ^ Core-fiction ^ Core-misc ^ Core-nonfiction ^ Acquis ^ Bible ^ Europarl ^ PressEurop ^ Subtitles ^ Syndicate ^ TOTAL ^ | ^ [[https://en.wikipedia.org/wiki/ISO_639-1|Lang]] ^ Core-fiction ^ Core-misc ^ Core-nonfiction ^ Acquis ^ Bible ^ Europarl ^ PressEurop ^ Subtitles ^ Syndicate ^ TOTAL ^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=af|af]]| – | – | – | – | – | – | – | 134.6| – | 134.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=af|af]]| – | – | – | – | – | – | – | 134.6| – ^ 134.6^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ar|ar]]| 28.8| 5.5| – | – | – | – | – | 126 195.5| 384.5| 126 614.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ar|ar]]| 28.8| 5.5| – | – | – | – | – | 126 195.5| 384.5^ 126 614.3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=be|be]]| 7 068.7| 57.7| – | – | – | – | – | – | – | 7 126.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=be|be]]| 7 068.7| 57.7| – | – | – | – | – | – | – ^ 7 126.4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=bg|bg]]| 7 067.3| – | – | 13 582.3| – | 9 082.0| – | 164 644.1| – | 194 375.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=bg|bg]]| 7 067.3| – | – | 13 582.3| – | 9 082.0| – | 164 644.1| – ^ 194 375.7^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=bn|bn]]| – | – | – | – | – | – | – | 1 517.7| – | 1 517.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=bn|bn]]| – | – | – | – | – | – | – | 1 517.7| – ^ 1 517.7^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=br|br]]| – | – | – | – | – | – | – | 97.4| – | 97.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=br|br]]| – | – | – | – | – | – | – | 97.4| – ^ 97.4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=bs|bs]]| – | – | – | – | – | – | – | 56 465.9| – | 56 465.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=bs|bs]]| – | – | – | – | – | – | – | 56 465.9| – ^ 56 465.9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ca|ca]]| 9 951.3| 9.7| – | – | 728.2| – | – | 2 692.1| – | 13 381.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ca|ca]]| 9 951.3| 9.7| – | – | 728.2| – | – | 2 692.1| – ^ 13 381.4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=cs|cs]]| 113 632.3| 2 637.1| 8 412.5| 19 188.9| 562.5| 12 918.7| 2 313.3| 232 969.1| 4 718.6| 397 352.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=cs|cs]]| 113 632.3| 2 637.1| 8 412.5| 19 188.9| 562.5| 12 918.7| 2 313.3| 232 969.1| 4 718.6^ 397 352.9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=da|da]]| 9 460.8| 11.9| 56.0| 20 014.9| 655.2| 13 800.4| – | 71 590.8| – | 115 590.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=da|da]]| 9 460.8| 11.9| 56.0| 20 014.9| 655.2| 13 800.4| – | 71 590.8| – ^ 115 590.0^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=de|de]]| 35 653.3| 1 066.1| 4 037.3| 20 716.9| 725.0| 13 156.2| 2 506.5| 98 808.9| 5 103.7| 181 773.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=de|de]]| 35 653.3| 1 066.1| 4 037.3| 20 716.9| 725.0| 13 156.2| 2 506.5| 98 808.9| 5 103.7^ 181 773.9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=el|el]]| – | – | – | 23 684.5| – | 15 381.7| – | 161 856.7| – | 200 922.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=el|el]]| – | – | – | 23 684.5| – | 15 381.7| – | 161 856.7| – ^ 200 922.9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=en|en]]| 36 519.3| 778.3| 4 618.7| 23 062.9| 727.6| 15 593.0| 2 663.8| 267 843.8| 5 272.8| 357 080.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=en|en]]| 36 519.3| 778.3| 4 618.7| 23 062.9| 727.6| 15 593.0| 2 663.8| 267 843.8| 5 272.8^ 357 080.3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=eo|eo]]| – | – | – | – | – | – | – | 221.0| – | 221.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=eo|eo]]| – | – | – | – | – | – | – | 221.0| – ^ 221.0^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=es|es]]| 29 664.1| 165.1| 830.9| 26 269.3| – | 16 248.5| 2 857.8| 223 006.0| 6 070.2| 305 112.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=es|es]]| 29 664.1| 165.1| 830.9| 26 269.3| – | 16 248.5| 2 857.8| 223 006.0| 6 070.2^ 305 112.0^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=et|et]]| 78.8| – | – | 14 884.2| – | 10 898.7| – | 54 487.7| – | 80 349.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=et|et]]| 78.8| – | – | 14 884.2| – | 10 898.7| – | 54 487.7| – ^ 80 349.3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=eu|eu]]| – | – | – | – | – | – | – | 2 999.9| – | 2 999.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=eu|eu]]| – | – | – | – | – | – | – | 2 999.9| – ^ 2 999.9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]| – | – | – | – | – | – | – | 32 635.9| – | 32 635.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]| – | – | – | – | – | – | – | 32 635.9| – ^ 32 635.9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 6 714.9| 44.4| 200.5| 15 264.2| 542.6| 10 109.3| – | 90 481.8| – | 123 357.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 6 714.9| 44.4| 200.5| 15 264.2| 542.6| 10 109.3| – | 90 481.8| – ^ 123 357.7^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 20 454.4| 194.3| 3 687.5| 26 298.4| 762.6| 17 186.4| 3 044.3| 181 033.4| 5 893.7| 258 555.1| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]| 20 454.4| 194.3| 3 687.5| 26 298.4| 762.6| 17 186.4| 3 044.3| 181 033.4| 5 893.7^ 258 555.1^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]| – | – | – | – | – | – | – | 622.1| – | 622.1| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]| – | – | – | – | – | – | – | 622.1| – ^ 622.1^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]| – | – | – | – | – | – | – | 129 458.6| – | 129 458.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]| – | – | – | – | – | – | – | 129 458.6| – ^ 129 458.6^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]| 402.8| – | – | – | – | – | – | 429.9| – | 832.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]| 402.8| – | – | – | – | – | – | 429.9| – ^ 832.7^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]| 22 763.6| 242.6| 1 523.4| – | 569.9| – | – | 137 844.3| – | 162 943.8| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]| 22 763.6| 242.6| 1 523.4| – | 569.9| – | – | 137 844.3| – ^ 162 943.8^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]| 405.3| 36.6| 24.4| – | – | – | – | – | – | 466.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]| 405.3| 36.6| 24.4| – | – | – | – | – | – ^ 466.3^ |
^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]| 6 890.1| 28.9| – | 17 851.3| – | 12 187.9| – | 141 559.0| 8.4| 178 525.6| | ^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]| 6 890.1| 28.9| – | 17 851.3| – | 12 187.9| – | 141 559.0| 8.4^ 178 525.6^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]| – | – | – | – | – | – | – | 23.5| – | 23.5| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]| – | – | – | – | – | – | – | 23.5| – ^ 23.5^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]| – | – | – | – | – | – | – | 37 824.9| – | 37 824.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]| – | – | – | – | – | – | – | 37 824.9| – ^ 37 824.9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]| – | – | – | – | – | – | – | 7 374.2| – | 7 374.2| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]| – | – | – | – | – | – | – | 7 374.2| – ^ 7 374.2^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]| 17 435.8| 50.6| 647.8| 23 892.0| 685.2| 15 511.4| 2 750.7| 163 859.9| 1 391.5| 226 224.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]| 17 435.8| 50.6| 647.8| 23 892.0| 685.2| 15 511.4| 2 750.7| 163 859.9| 1 391.5^ 226 224.9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]| 3 766.7| 64.9| 163.1| – | – | – | – | 12 141.5| 2.5| 16 138.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]| 3 766.7| 64.9| 163.1| – | – | – | – | 12 141.5| 2.5^ 16 138.6^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]| – | – | – | – | – | – | – | 871.1| – | 871.1| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]| – | – | – | – | – | – | – | 871.1| – ^ 871.1^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]| – | – | – | – | – | – | – | 13.9| – | 13.9| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]| – | – | – | – | – | – | – | 13.9| – ^ 13.9^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]| – | – | – | – | – | – | – | 5 964.3| – | 5 964.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]| – | – | – | – | – | – | – | 5 964.3| – ^ 5 964.3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]| 669.1| 7.2| 17.4| 17 175.1| 471.2| 11 198.5| – | 5 247.7| – | 34 786.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]| 669.1| 7.2| 17.4| 17 175.1| 471.2| 11 198.5| – | 5 247.7| – ^ 34 786.3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]| 3 207.6| 362.1| 66.9| 17 519.4| 536.7| 11 682.0| – | 2 050.4| – | 35 425.1| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]| 3 207.6| 362.1| 66.9| 17 519.4| 536.7| 11 682.0| – | 2 050.4| – ^ 35 425.1^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]| 8 794.5| 86.5| – | – | – | – | – | 15 112.0| – | 23 993.1| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]| 8 794.5| 86.5| – | – | – | – | – | 15 112.0| – ^ 23 993.1^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]| – | – | – | – | – | – | – | 1 258.4| – | 1 258.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]| – | – | – | – | – | – | – | 1 258.4| – ^ 1 258.4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]| – | – | – | – | – | – | – | 7 828.0| – | 7 828.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]| – | – | – | – | – | – | – | 7 828.0| – ^ 7 828.0^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]| – | – | – | 13 805.0| – | – | – | – | – | 13 805.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]| – | – | – | 13 805.0| – | – | – | – | – ^ 13 805.0^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]| 17 229.8| 356.4| 1 193.5| 23 401.1| 716.8| 15 555.9| 2 952.8| 170 892.9| 812.1| 233 111.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]| 17 229.8| 356.4| 1 193.5| 23 401.1| 716.8| 15 555.9| 2 952.8| 170 892.9| 812.1^ 233 111.3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]| 7 690.7| 138.1| 392.0| – | 723.9| – | – | 39 805.6| – | 48 750.2| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]| 7 690.7| 138.1| 392.0| – | 723.9| – | – | 39 805.6| – ^ 48 750.2^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]| 27 056.2| 283.2| 754.2| 19 482.9| 576.1| 12 662.8| 2 367.5| 164 059.8| – | 227 242.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]| 27 056.2| 283.2| 754.2| 19 482.9| 576.1| 12 662.8| 2 367.5| 164 059.8| – ^ 227 242.6^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]| 7 204.0| 81.3| – | 24 385.0| 706.2| 15 188.4| 2 782.5| 229 480.2| 738.5| 280 566.2| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]| 7 204.0| 81.3| – | 24 385.0| 706.2| 15 188.4| 2 782.5| 229 480.2| 738.5^ 280 566.2^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 8.4| 5.2| – | – | – | – | – | – | – | 13.6| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 8.4| 5.2| – | – | – | – | – | – | – ^ 13.6^ |
^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 4 132.6| 64.1| – | 8 043.5| – | 9 426.4| 2 725.2| 211 310.4| – | 235 702.3| | ^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 4 132.6| 64.1| – | 8 043.5| – | 9 426.4| 2 725.2| 211 310.4| – ^ 235 702.3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 11 757.6| 143.8| 518.7| – | 565.5| – | – | 104 831.9| 4 312.8| 122 130.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 11 757.6| 143.8| 518.7| – | 565.5| – | – | 104 831.9| 4 312.8^ 122 130.4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| – | – | – | – | – | – | – | 2 313.4| – | 2 313.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| – | – | – | – | – | – | – | 2 313.4| – ^ 2 313.4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sk|sk]]| 7 626.6| 402.2| 558.0| 18 398.8| 560.8| 12 727.0| – | 34 589.4| – | 74 862.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sk|sk]]| 7 626.6| 402.2| 558.0| 18 398.8| 560.8| 12 727.0| – | 34 589.4| – ^ 74 862.7^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sl|sl]]| 4 611.2| 6.1| 22.4| 18 510.4| – | 12 249.8| – | 83 057.1| – | 118 457.1| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sl|sl]]| 4 611.2| 6.1| 22.4| 18 510.4| – | 12 249.8| – | 83 057.1| – ^ 118 457.1^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sq|sq]]| – | – | – | – | – | – | – | 9 171.4| – | 9 171.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sq|sq]]| – | – | – | – | – | – | – | 9 171.4| – ^ 9 171.4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sr|sr]]| 12 556.0| 29.3| 119.3| – | – | – | – | 152 425.6| – | 165 130.2| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sr|sr]]| 12 556.0| 29.3| 119.3| – | – | – | – | 152 425.6| – ^ 165 130.2^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sv|sv]]| 18 011.7| 454.8| 1 273.0| 19 443.0| 637.9| 13 777.6| – | 81 490.5| – | 135 088.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sv|sv]]| 18 011.7| 454.8| 1 273.0| 19 443.0| 637.9| 13 777.6| – | 81 490.5| – ^ 135 088.4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ta|ta]]| – | – | – | – | – | – | – | 104.0| – | 104.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ta|ta]]| – | – | – | – | – | – | – | 104.0| – ^ 104.0^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=te|te]]| – | – | – | – | – | – | – | 96.0| – | 96.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=te|te]]| – | – | – | – | – | – | – | 96.0| – ^ 96.0^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=th|th]]| – | – | – | – | – | – | – | 5 626.0| – | 5 626.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=th|th]]| – | – | – | – | – | – | – | 5 626.0| – ^ 5 626.0^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=tl|tl]]| – | – | – | – | – | – | – | 37.0| – | 37.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=tl|tl]]| – | – | – | – | – | – | – | 37.0| – ^ 37.0^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=tr|tr]]| – | – | – | – | – | – | – | 147 635.3| – | 147 635.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=tr|tr]]| – | – | – | – | – | – | – | 147 635.3| – ^ 147 635.3^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=uk|uk]]| 14 478.3| 38.9| 333.0| – | 596.1| – | – | 3 779.0| – | 19 225.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=uk|uk]]| 14 478.3| 38.9| 333.0| – | 596.1| – | – | 3 779.0| – ^ 19 225.4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ur|ur]]| – | – | – | – | – | – | – | 155.7| – | 155.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ur|ur]]| – | – | – | – | – | – | – | 155.7| – ^ 155.7^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=vi|vi]]| – | – | – | – | – | – | – | 19 281.4| – | 19 281.4| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=vi|vi]]| – | – | – | – | – | – | – | 19 281.4| – ^ 19 281.4^ |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=zh|zh]]| 215.4| – | – | – | – | – | – | 70 963.9| 675.9| 71 855.3| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=zh|zh]]| 215.4| – | – | – | – | – | – | 70 963.9| 675.9^ 71 855.3^ |
TOTAL| 473 208.2| 7 852.9| 29 450.5| 424 874.2| 12 050.1| 276 542.6| 26 964.4| 3 970 272.9| 35 385.2| 5 256 601.0| | ^TOTAL^ 473 208.2^ 7 852.9^ 29 450.5^ 424 874.2^ 12 050.1^ 276 542.6^ 26 964.4^ 3 970 272.9^ 35 385.2^ 5 256 601.0^ |
==== Detailed statistics ==== | ==== Detailed statistics ==== |
| |
^:::|Subtitles| 1| 11 378| 11 952.3| 70 963.9| 79 539.4| 448.9| 439.5| 6.046| 1.689| 0.548| 2.081| 0.791| 2.289| | ^:::|Subtitles| 1| 11 378| 11 952.3| 70 963.9| 79 539.4| 448.9| 439.5| 6.046| 1.689| 0.548| 2.081| 0.791| 2.289| |
^:::|Syndicate| 5| 654| 29.7| 675.9| 766.7| 493.8| 489.5| 23.166| 4.110| 1.795| 7.026| 2.391| 3.366| | ^:::|Syndicate| 5| 654| 29.7| 675.9| 766.7| 493.8| 489.5| 23.166| 4.110| 1.795| 7.026| 2.391| 3.366| |
| |
==== Number of texts in the Core ==== | |
| |
^ Language ^^ Number of texts ^ including originals ^ | |
^ ar ^ Arabic | 3 | 1 | | |
^ be ^ Belarusian | 108 | 14 | | |
^ bg ^ Bulgarian | 87 | 19 | | |
^ ca ^ Catalan | 92 | 1 | | |
^ cs ^ Czech | 1 812 | 368 | | |
^ da ^ Danish | 93 | 9 | | |
^ de ^ German | 471 | 163 | | |
^ en ^ English | 422 | 271 | | |
^ es ^ Spanish | 355 | 142 | | |
^ et ^ Estonian | 1 | 0 | | |
^ fi ^ Finnish | 112 | 36 | | |
^ fr ^ French | 277 | 126 | | |
^ hi ^ Hindi | 7 | 2 | | |
^ hr ^ Croatian | 324 | 37 | | |
^ hs ^ Upper Sorbian | 13 | 5 | | |
^ hu ^ Hungarian | 89 | 1 | | |
^ it ^ Italian | 171 | 26 | | |
^ ja ^ Japanese | 35 | 15 | | |
^ lt ^ Lithuanian | 23 | 4 | | |
^ lv ^ Latvian | 73 | 15 | | |
^ mk ^ Macedonian | 108 | 4 | | |
^ nl ^ Dutch | 215 | 52 | | |
^ no ^ Norwegian | 102 | 23 | | |
^ pl ^ Polish | 348 | 54 | | |
^ pt ^ Portuguese | 87 | 24 | | |
^ rn ^ Romani | 2 | 2 | | |
^ ro ^ Romanian | 45 | 5 | | |
^ ru ^ Russian | 160 | 37 | | |
^ sk ^ Slovak | 165 | 62 | | |
^ sl ^ Slovene | 73 | 25 | | |
^ sr ^ Serbian | 148 | 8 | | |
^ sv ^ Swedish | 232 | 101 | | |
^ uk ^ Ukrainian | 199 | 8 | | |
^ zh ^ Chinese | 3 | 3 | | |
^ **TOTAL** ^ | 6 495 | 1 668 | | |
| |
| |
| |