AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:intercorp:verze16ud [2024/09/24 00:12] – [Corpus size in thousands of words by language and collection] alexandrrosenen:cnk:intercorp:verze16ud [2024/09/24 09:14] (current) – [Number of texts in the Core] alexandrrosen
Line 73: Line 73:
 InterCorp release 16ud contains the **same texts** as InterCorp release 16. They **differ only in linguistic annotation**. However, the token and word count data in 16ud may differ slightly due to a different tokenization method. InterCorp release 16ud contains the **same texts** as InterCorp release 16. They **differ only in linguistic annotation**. However, the token and word count data in 16ud may differ slightly due to a different tokenization method.
  
-The **core** of InterCorp consists of fiction, some non-fiction and a marginal share of other text types such as drama or poetry. The alignment of texts in the core is manually chacked. The other texts, grouped in **collections**, are aligned automatically without human intervention. The choice in the present release includes:+The **core** of InterCorp consists of fiction, some non-fiction and a marginal share of other text types such as drama or poetry. The alignment of texts in the core is manually checked. The other texts, grouped in **collections**, are aligned automatically without human intervention. The choice in the present release includes:
  
-  * Political commentaries published by [[http://www.project-syndicate.org/|Project Syndicate]] and [[http://www.voxeurop.eu|VoxEurop]] (formerly PressEurop) +  * Political commentaries published by [[http://www.project-syndicate.org/|Project Syndicate]] (below referred to as **Syndicate**) and [[http://www.voxeurop.eu|VoxEurop]] (formerly **PressEurop**
-  * A package of legal texts of the European Union form the [[https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis|Acquis Communautaire]] corpus +  * A package of legal texts of the European Union form the [[https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis|Acquis Communautaire]] corpus (**Acquis**) 
-  * Proceedings of the European Parliament dated 2007–2011 from the [[http://www.statmt.org/europarl/|Europarl]] corpus +  * Proceedings of the European Parliament dated 2007–2011 from the [[http://www.statmt.org/europarl/|Europarl]] corpus (**Europarl**) 
-  * Film subtitles from the [[http://www.opensubtitles.org/|Open Subtitles]] database +  * Film subtitles from the [[http://www.opensubtitles.org/|Open Subtitles]] database (**Subtitles**) 
-  * Translations of the Bible+  * Translations of the **Bible**
  
 In texts aligned automatically without manual checking the search results may include a higher number of misaligned segments. Also, some collections do not retain all texts from the original resource. This includes texts that have no Czech counterpart. Some texts from the //Acquis Communautaire// and //Europarl// corpora have been partially corrected or omitted – as a result, they may differ in form or size if compared with the original source. A similar selection was applied to the //Open Subtitles// database, where – as an additional reduction – only a single translation was selected per title and language. On the other hand, some metadata items missing in the original resource but detectable from context or other sources have been added. In texts aligned automatically without manual checking the search results may include a higher number of misaligned segments. Also, some collections do not retain all texts from the original resource. This includes texts that have no Czech counterpart. Some texts from the //Acquis Communautaire// and //Europarl// corpora have been partially corrected or omitted – as a result, they may differ in form or size if compared with the original source. A similar selection was applied to the //Open Subtitles// database, where – as an additional reduction – only a single translation was selected per title and language. On the other hand, some metadata items missing in the original resource but detectable from context or other sources have been added.
Line 94: Line 94:
 ===== The corpus in numbers ===== ===== The corpus in numbers =====
  
-In the tables below, the Core part of the corpus is split according to the text type into fiction, non-fiction, and "misc" (for "miscellaneous"such as drama, poetry or children's literature). +==== Number of texts in the Core ==== 
 + 
 +^  Language  ^^ Number of texts ^ including originals ^ 
 +^  ar  ^ Arabic |  3 |  1 | 
 +^  be  ^ Belarusian |  108 |  14 | 
 +^  bg  ^ Bulgarian |  87 |  19 | 
 +^  ca  ^ Catalan |  92 |  1 | 
 +^  cs  ^ Czech |  1 812 |  368 | 
 +^  da  ^ Danish |  93 |  9 | 
 +^  de  ^ German |  471 |  163 | 
 +^  en  ^ English |  422 |  271 | 
 +^  es  ^ Spanish |  355 |  142 | 
 +^  et  ^ Estonian |  1 |  0 | 
 +^  fi  ^ Finnish |  112 |  36 | 
 +^  fr  ^ French |  277 |  126 | 
 +^  hi  ^ Hindi |  7 |  2 | 
 +^  hr  ^ Croatian |  324 |  37 | 
 +^  hs  ^ Upper Sorbian |  13 |  5 | 
 +^  hu  ^ Hungarian |  89 |  1 | 
 +^  it  ^ Italian |  171 |  26 | 
 +^  ja  ^ Japanese |  35 |  15 | 
 +^  lt  ^ Lithuanian |  23 |  4 | 
 +^  lv  ^ Latvian |  73 |  15 | 
 +^  mk  ^ Macedonian |  108 |  4 | 
 +^  nl  ^ Dutch |  215 |  52 | 
 +^  no  ^ Norwegian |  102 |  23 | 
 +^  pl  ^ Polish |  348 |  54 | 
 +^  pt  ^ Portuguese |  87 |  24 | 
 +^  rn  ^ Romani |  2 |  2 | 
 +^  ro  ^ Romanian |  45 |  5 | 
 +^  ru  ^ Russian |  160 |  37 | 
 +^  sk  ^ Slovak |  165 |  62 | 
 +^  sl  ^ Slovene |  73 |  25 | 
 +^  sr  ^ Serbian |  148 |  8 | 
 +^  sv  ^ Swedish |  232 |  101 | 
 +^  uk  ^ Ukrainian |  199 |  8 | 
 +^  zh  ^ Chinese |  3 |  3 | 
 +^  **TOTAL**  ^    6 495 |  1 668 | 
 + 
 + 
 +In the tables below, the Core part of the corpus is split according to the text type into fiction (**Core-fiction**), non-fiction (**Core-nonfiction**), and miscellaneous (**Core-misc**)including drama, poetry or children's literature). 
  
 ==== Corpus size by collection ==== ==== Corpus size by collection ====
Line 109: Line 149:
 ^Subtitles|    58|   965 557|   793 931|  3 970 273|  5 162 184| ^Subtitles|    58|   965 557|   793 931|  3 970 273|  5 162 184|
 ^Syndicate|    162|   39 158|   1 697|   35 385|   40 423| ^Syndicate|    162|   39 158|   1 697|   35 385|   40 423|
-^TOTAL  6 826 2 831 743  880 152 5 256 601 6 711 091|+^TOTAL  6 826 2 831 743  880 152 5 256 601 6 711 091^
  
  
Line 178: Line 218:
 ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=vi|vi]]|    1|   3 468|   3 304.5|   19 281.4|   23 984.0| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=vi|vi]]|    1|   3 468|   3 304.5|   19 281.4|   23 984.0|
 ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=zh|zh]]|    9|   12 035|   11 993.7|   71 855.3|   80 560.0| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=zh|zh]]|    9|   12 035|   11 993.7|   71 855.3|   80 560.0|
-^TOTAL  6 826 2 831 743  880 152.2 5 256 601.0 6 711 091.0|+^TOTAL  6 826 2 831 743  880 152.2 5 256 601.0 6 711 091.0^
  
 ==== Corpus size in thousands of words by language and collection ==== ==== Corpus size in thousands of words by language and collection ====
  
 ^  [[https://en.wikipedia.org/wiki/ISO_639-1|Lang]]  ^ Core-fiction  ^  Core-misc  ^  Core-nonfiction  ^  Acquis  ^  Bible  ^  Europarl  ^  PressEurop  ^  Subtitles  ^  Syndicate  ^  TOTAL  ^ ^  [[https://en.wikipedia.org/wiki/ISO_639-1|Lang]]  ^ Core-fiction  ^  Core-misc  ^  Core-nonfiction  ^  Acquis  ^  Bible  ^  Europarl  ^  PressEurop  ^  Subtitles  ^  Syndicate  ^  TOTAL  ^
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=af|af]]|  – |  – |  – |  – |  – |  – |  – |     134.6|  –     134.6| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=af|af]]|  – |  – |  – |  – |  – |  – |  – |     134.6|  –     134.6^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ar|ar]]|     28.8|     5.5|  – |  – |  – |  – |  – |    126 195.5|     384.5   126 614.3| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ar|ar]]|     28.8|     5.5|  – |  – |  – |  – |  – |    126 195.5|     384.5   126 614.3^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=be|be]]|    7 068.7|     57.7|  – |  – |  – |  – |  – |  – |  –    7 126.4| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=be|be]]|    7 068.7|     57.7|  – |  – |  – |  – |  – |  – |  –    7 126.4^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=bg|bg]]|    7 067.3|  – |  – |    13 582.3|  – |    9 082.0|  – |    164 644.1|  –    194 375.7| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=bg|bg]]|    7 067.3|  – |  – |    13 582.3|  – |    9 082.0|  – |    164 644.1|  –    194 375.7^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=bn|bn]]|  – |  – |  – |  – |  – |  – |  – |    1 517.7|  –    1 517.7| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=bn|bn]]|  – |  – |  – |  – |  – |  – |  – |    1 517.7|  –    1 517.7^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=br|br]]|  – |  – |  – |  – |  – |  – |  – |     97.4|  –     97.4| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=br|br]]|  – |  – |  – |  – |  – |  – |  – |     97.4|  –     97.4^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=bs|bs]]|  – |  – |  – |  – |  – |  – |  – |    56 465.9|  –    56 465.9| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=bs|bs]]|  – |  – |  – |  – |  – |  – |  – |    56 465.9|  –    56 465.9^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ca|ca]]|    9 951.3|     9.7|  – |  – |     728.2|  – |  – |    2 692.1|  –    13 381.4| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ca|ca]]|    9 951.3|     9.7|  – |  – |     728.2|  – |  – |    2 692.1|  –    13 381.4^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=cs|cs]]|    113 632.3|    2 637.1|    8 412.5|    19 188.9|     562.5|    12 918.7|    2 313.3|    232 969.1|    4 718.6   397 352.9| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=cs|cs]]|    113 632.3|    2 637.1|    8 412.5|    19 188.9|     562.5|    12 918.7|    2 313.3|    232 969.1|    4 718.6   397 352.9^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=da|da]]|    9 460.8|     11.9|     56.0|    20 014.9|     655.2|    13 800.4|  – |    71 590.8|  –    115 590.0| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=da|da]]|    9 460.8|     11.9|     56.0|    20 014.9|     655.2|    13 800.4|  – |    71 590.8|  –    115 590.0^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=de|de]]|    35 653.3|    1 066.1|    4 037.3|    20 716.9|     725.0|    13 156.2|    2 506.5|    98 808.9|    5 103.7   181 773.9| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=de|de]]|    35 653.3|    1 066.1|    4 037.3|    20 716.9|     725.0|    13 156.2|    2 506.5|    98 808.9|    5 103.7   181 773.9^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=el|el]]|  – |  – |  – |    23 684.5|  – |    15 381.7|  – |    161 856.7|  –    200 922.9| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=el|el]]|  – |  – |  – |    23 684.5|  – |    15 381.7|  – |    161 856.7|  –    200 922.9^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=en|en]]|    36 519.3|     778.3|    4 618.7|    23 062.9|     727.6|    15 593.0|    2 663.8|    267 843.8|    5 272.8   357 080.3| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=en|en]]|    36 519.3|     778.3|    4 618.7|    23 062.9|     727.6|    15 593.0|    2 663.8|    267 843.8|    5 272.8   357 080.3^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=eo|eo]]|  – |  – |  – |  – |  – |  – |  – |     221.0|  –     221.0| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=eo|eo]]|  – |  – |  – |  – |  – |  – |  – |     221.0|  –     221.0^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=es|es]]|    29 664.1|     165.1|     830.9|    26 269.3|  – |    16 248.5|    2 857.8|    223 006.0|    6 070.2   305 112.0| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=es|es]]|    29 664.1|     165.1|     830.9|    26 269.3|  – |    16 248.5|    2 857.8|    223 006.0|    6 070.2   305 112.0^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=et|et]]|     78.8|  – |  – |    14 884.2|  – |    10 898.7|  – |    54 487.7|  –    80 349.3| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=et|et]]|     78.8|  – |  – |    14 884.2|  – |    10 898.7|  – |    54 487.7|  –    80 349.3^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=eu|eu]]|  – |  – |  – |  – |  – |  – |  – |    2 999.9|  –    2 999.9| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=eu|eu]]|  – |  – |  – |  – |  – |  – |  – |    2 999.9|  –    2 999.9^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]|  – |  – |  – |  – |  – |  – |  – |    32 635.9|  –    32 635.9| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fa|fa]]|  – |  – |  – |  – |  – |  – |  – |    32 635.9|  –    32 635.9^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]|    6 714.9|     44.4|     200.5|    15 264.2|     542.6|    10 109.3|  – |    90 481.8|  –    123 357.7| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]|    6 714.9|     44.4|     200.5|    15 264.2|     542.6|    10 109.3|  – |    90 481.8|  –    123 357.7^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]|    20 454.4|     194.3|    3 687.5|    26 298.4|     762.6|    17 186.4|    3 044.3|    181 033.4|    5 893.7   258 555.1| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fi|fi]]|    20 454.4|     194.3|    3 687.5|    26 298.4|     762.6|    17 186.4|    3 044.3|    181 033.4|    5 893.7   258 555.1^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]|  – |  – |  – |  – |  – |  – |  – |     622.1|  –     622.1| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=fr|fr]]|  – |  – |  – |  – |  – |  – |  – |     622.1|  –     622.1^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]|  – |  – |  – |  – |  – |  – |  – |    129 458.6|  –    129 458.6| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=gl|gl]]|  – |  – |  – |  – |  – |  – |  – |    129 458.6|  –    129 458.6^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]|     402.8|  – |  – |  – |  – |  – |  – |     429.9|  –     832.7| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=he|he]]|     402.8|  – |  – |  – |  – |  – |  – |     429.9|  –     832.7^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]|    22 763.6|     242.6|    1 523.4|  – |     569.9|  – |  – |    137 844.3|  –    162 943.8| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hi|hi]]|    22 763.6|     242.6|    1 523.4|  – |     569.9|  – |  – |    137 844.3|  –    162 943.8^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]|     405.3|     36.6|     24.4|  – |  – |  – |  – |  – |  –     466.3| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hr|hr]]|     405.3|     36.6|     24.4|  – |  – |  – |  – |  – |  –     466.3^ 
-^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]|    6 890.1|     28.9|  – |    17 851.3|  – |    12 187.9|  – |    141 559.0|     8.4   178 525.6| +^[[https://en.wikipedia.org/wiki/Upper_Sorbian_language|hs]]|    6 890.1|     28.9|  – |    17 851.3|  – |    12 187.9|  – |    141 559.0|     8.4   178 525.6^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]|  – |  – |  – |  – |  – |  – |  – |     23.5|  –     23.5| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hu|hu]]|  – |  – |  – |  – |  – |  – |  – |     23.5|  –     23.5^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]|  – |  – |  – |  – |  – |  – |  – |    37 824.9|  –    37 824.9| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=hy|hy]]|  – |  – |  – |  – |  – |  – |  – |    37 824.9|  –    37 824.9^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]|  – |  – |  – |  – |  – |  – |  – |    7 374.2|  –    7 374.2| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=id|id]]|  – |  – |  – |  – |  – |  – |  – |    7 374.2|  –    7 374.2^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]|    17 435.8|     50.6|     647.8|    23 892.0|     685.2|    15 511.4|    2 750.7|    163 859.9|    1 391.5   226 224.9| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=is|is]]|    17 435.8|     50.6|     647.8|    23 892.0|     685.2|    15 511.4|    2 750.7|    163 859.9|    1 391.5   226 224.9^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]|    3 766.7|     64.9|     163.1|  – |  – |  – |  – |    12 141.5|     2.5   16 138.6| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=it|it]]|    3 766.7|     64.9|     163.1|  – |  – |  – |  – |    12 141.5|     2.5   16 138.6^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]|  – |  – |  – |  – |  – |  – |  – |     871.1|  –     871.1| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ja|ja]]|  – |  – |  – |  – |  – |  – |  – |     871.1|  –     871.1^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]|  – |  – |  – |  – |  – |  – |  – |     13.9|  –     13.9| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ka|ka]]|  – |  – |  – |  – |  – |  – |  – |     13.9|  –     13.9^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]|  – |  – |  – |  – |  – |  – |  – |    5 964.3|  –    5 964.3| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=kk|kk]]|  – |  – |  – |  – |  – |  – |  – |    5 964.3|  –    5 964.3^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]|     669.1|     7.2|     17.4|    17 175.1|     471.2|    11 198.5|  – |    5 247.7|  –    34 786.3| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ko|ko]]|     669.1|     7.2|     17.4|    17 175.1|     471.2|    11 198.5|  – |    5 247.7|  –    34 786.3^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]|    3 207.6|     362.1|     66.9|    17 519.4|     536.7|    11 682.0|  – |    2 050.4|  –    35 425.1| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lt|lt]]|    3 207.6|     362.1|     66.9|    17 519.4|     536.7|    11 682.0|  – |    2 050.4|  –    35 425.1^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]|    8 794.5|     86.5|  – |  – |  – |  – |  – |    15 112.0|  –    23 993.1| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=lv|lv]]|    8 794.5|     86.5|  – |  – |  – |  – |  – |    15 112.0|  –    23 993.1^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]|  – |  – |  – |  – |  – |  – |  – |    1 258.4|  –    1 258.4| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mk|mk]]|  – |  – |  – |  – |  – |  – |  – |    1 258.4|  –    1 258.4^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]|  – |  – |  – |  – |  – |  – |  – |    7 828.0|  –    7 828.0| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ml|ml]]|  – |  – |  – |  – |  – |  – |  – |    7 828.0|  –    7 828.0^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]|  – |  – |  – |    13 805.0|  – |  – |  – |  – |  –    13 805.0| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ms|ms]]|  – |  – |  – |    13 805.0|  – |  – |  – |  – |  –    13 805.0^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]|    17 229.8|     356.4|    1 193.5|    23 401.1|     716.8|    15 555.9|    2 952.8|    170 892.9|     812.1   233 111.3| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=mt|mt]]|    17 229.8|     356.4|    1 193.5|    23 401.1|     716.8|    15 555.9|    2 952.8|    170 892.9|     812.1   233 111.3^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]|    7 690.7|     138.1|     392.0|  – |     723.9|  – |  – |    39 805.6|  –    48 750.2| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=nl|nl]]|    7 690.7|     138.1|     392.0|  – |     723.9|  – |  – |    39 805.6|  –    48 750.2^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]|    27 056.2|     283.2|     754.2|    19 482.9|     576.1|    12 662.8|    2 367.5|    164 059.8|  –    227 242.6| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=no|no]]|    27 056.2|     283.2|     754.2|    19 482.9|     576.1|    12 662.8|    2 367.5|    164 059.8|  –    227 242.6^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]|    7 204.0|     81.3|  – |    24 385.0|     706.2|    15 188.4|    2 782.5|    229 480.2|     738.5   280 566.2| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pl|pl]]|    7 204.0|     81.3|  – |    24 385.0|     706.2|    15 188.4|    2 782.5|    229 480.2|     738.5   280 566.2^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]|     8.4|     5.2|  – |  – |  – |  – |  – |  – |  –     13.6| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]|     8.4|     5.2|  – |  – |  – |  – |  – |  – |  –     13.6^ 
-^[[https://en.wikipedia.org/wiki/Romani_language|rn]]|    4 132.6|     64.1|  – |    8 043.5|  – |    9 426.4|    2 725.2|    211 310.4|  –    235 702.3| +^[[https://en.wikipedia.org/wiki/Romani_language|rn]]|    4 132.6|     64.1|  – |    8 043.5|  – |    9 426.4|    2 725.2|    211 310.4|  –    235 702.3^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]|    11 757.6|     143.8|     518.7|  – |     565.5|  – |  – |    104 831.9|    4 312.8   122 130.4| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]|    11 757.6|     143.8|     518.7|  – |     565.5|  – |  – |    104 831.9|    4 312.8   122 130.4^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]|  – |  – |  – |  – |  – |  – |  – |    2 313.4|  –    2 313.4| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]|  – |  – |  – |  – |  – |  – |  – |    2 313.4|  –    2 313.4^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sk|sk]]|    7 626.6|     402.2|     558.0|    18 398.8|     560.8|    12 727.0|  – |    34 589.4|  –    74 862.7| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sk|sk]]|    7 626.6|     402.2|     558.0|    18 398.8|     560.8|    12 727.0|  – |    34 589.4|  –    74 862.7^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sl|sl]]|    4 611.2|     6.1|     22.4|    18 510.4|  – |    12 249.8|  – |    83 057.1|  –    118 457.1| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sl|sl]]|    4 611.2|     6.1|     22.4|    18 510.4|  – |    12 249.8|  – |    83 057.1|  –    118 457.1^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sq|sq]]|  – |  – |  – |  – |  – |  – |  – |    9 171.4|  –    9 171.4| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sq|sq]]|  – |  – |  – |  – |  – |  – |  – |    9 171.4|  –    9 171.4^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sr|sr]]|    12 556.0|     29.3|     119.3|  – |  – |  – |  – |    152 425.6|  –    165 130.2| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sr|sr]]|    12 556.0|     29.3|     119.3|  – |  – |  – |  – |    152 425.6|  –    165 130.2^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sv|sv]]|    18 011.7|     454.8|    1 273.0|    19 443.0|     637.9|    13 777.6|  – |    81 490.5|  –    135 088.4| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sv|sv]]|    18 011.7|     454.8|    1 273.0|    19 443.0|     637.9|    13 777.6|  – |    81 490.5|  –    135 088.4^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ta|ta]]|  – |  – |  – |  – |  – |  – |  – |     104.0|  –     104.0| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ta|ta]]|  – |  – |  – |  – |  – |  – |  – |     104.0|  –     104.0^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=te|te]]|  – |  – |  – |  – |  – |  – |  – |     96.0|  –     96.0| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=te|te]]|  – |  – |  – |  – |  – |  – |  – |     96.0|  –     96.0^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=th|th]]|  – |  – |  – |  – |  – |  – |  – |    5 626.0|  –    5 626.0| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=th|th]]|  – |  – |  – |  – |  – |  – |  – |    5 626.0|  –    5 626.0^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=tl|tl]]|  – |  – |  – |  – |  – |  – |  – |     37.0|  –     37.0| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=tl|tl]]|  – |  – |  – |  – |  – |  – |  – |     37.0|  –     37.0^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=tr|tr]]|  – |  – |  – |  – |  – |  – |  – |    147 635.3|  –    147 635.3| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=tr|tr]]|  – |  – |  – |  – |  – |  – |  – |    147 635.3|  –    147 635.3^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=uk|uk]]|    14 478.3|     38.9|     333.0|  – |     596.1|  – |  – |    3 779.0|  –    19 225.4| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=uk|uk]]|    14 478.3|     38.9|     333.0|  – |     596.1|  – |  – |    3 779.0|  –    19 225.4^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ur|ur]]|  – |  – |  – |  – |  – |  – |  – |     155.7|  –     155.7| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ur|ur]]|  – |  – |  – |  – |  – |  – |  – |     155.7|  –     155.7^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=vi|vi]]|  – |  – |  – |  – |  – |  – |  – |    19 281.4|  –    19 281.4| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=vi|vi]]|  – |  – |  – |  – |  – |  – |  – |    19 281.4|  –    19 281.4^ 
-^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=zh|zh]]|     215.4|  – |  – |  – |  – |  – |  – |    70 963.9|     675.9   71 855.3| +^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=zh|zh]]|     215.4|  – |  – |  – |  – |  – |  – |    70 963.9|     675.9   71 855.3^ 
-TOTAL   473 208.2   7 852.9   29 450.5   424 874.2   12 050.1   276 542.6   26 964.4  3 970 272.9   35 385.2  5 256 601.0|+^TOTAL   473 208.2   7 852.9   29 450.5   424 874.2   12 050.1   276 542.6   26 964.4  3 970 272.9   35 385.2  5 256 601.0^
 ==== Detailed statistics ==== ==== Detailed statistics ====
  
Line 481: Line 521:
 ^:::|Subtitles|  1|  11 378|  11 952.3|  70 963.9|  79 539.4|  448.9|  439.5|  6.046|  1.689|  0.548|  2.081|  0.791|  2.289| ^:::|Subtitles|  1|  11 378|  11 952.3|  70 963.9|  79 539.4|  448.9|  439.5|  6.046|  1.689|  0.548|  2.081|  0.791|  2.289|
 ^:::|Syndicate|  5|  654|  29.7|  675.9|  766.7|  493.8|  489.5|  23.166|  4.110|  1.795|  7.026|  2.391|  3.366| ^:::|Syndicate|  5|  654|  29.7|  675.9|  766.7|  493.8|  489.5|  23.166|  4.110|  1.795|  7.026|  2.391|  3.366|
- 
-==== Number of texts in the Core ==== 
- 
-^  Language  ^^ Number of texts ^ including originals ^ 
-^  ar  ^ Arabic |  3 |  1 | 
-^  be  ^ Belarusian |  108 |  14 | 
-^  bg  ^ Bulgarian |  87 |  19 | 
-^  ca  ^ Catalan |  92 |  1 | 
-^  cs  ^ Czech |  1 812 |  368 | 
-^  da  ^ Danish |  93 |  9 | 
-^  de  ^ German |  471 |  163 | 
-^  en  ^ English |  422 |  271 | 
-^  es  ^ Spanish |  355 |  142 | 
-^  et  ^ Estonian |  1 |  0 | 
-^  fi  ^ Finnish |  112 |  36 | 
-^  fr  ^ French |  277 |  126 | 
-^  hi  ^ Hindi |  7 |  2 | 
-^  hr  ^ Croatian |  324 |  37 | 
-^  hs  ^ Upper Sorbian |  13 |  5 | 
-^  hu  ^ Hungarian |  89 |  1 | 
-^  it  ^ Italian |  171 |  26 | 
-^  ja  ^ Japanese |  35 |  15 | 
-^  lt  ^ Lithuanian |  23 |  4 | 
-^  lv  ^ Latvian |  73 |  15 | 
-^  mk  ^ Macedonian |  108 |  4 | 
-^  nl  ^ Dutch |  215 |  52 | 
-^  no  ^ Norwegian |  102 |  23 | 
-^  pl  ^ Polish |  348 |  54 | 
-^  pt  ^ Portuguese |  87 |  24 | 
-^  rn  ^ Romani |  2 |  2 | 
-^  ro  ^ Romanian |  45 |  5 | 
-^  ru  ^ Russian |  160 |  37 | 
-^  sk  ^ Slovak |  165 |  62 | 
-^  sl  ^ Slovene |  73 |  25 | 
-^  sr  ^ Serbian |  148 |  8 | 
-^  sv  ^ Swedish |  232 |  101 | 
-^  uk  ^ Ukrainian |  199 |  8 | 
-^  zh  ^ Chinese |  3 |  3 | 
-^  **TOTAL**  ^    6 495 |  1 668 | 
-