Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:cnk:intercorp:verze9 [2016/07/11 11:15] – [Texts in the corpus] alexandrrosen | en:cnk:intercorp:verze9 [2019/10/06 20:43] (current) – [Taggers/lemmatizers:] michalskrabal | ||
|---|---|---|---|
| Line 25: | Line 25: | ||
| After [[http:// | After [[http:// | ||
| - | InterCorp can be accessed via a standard web browser from the integrated search interface of the Czech National Corpus [[http:// | + | InterCorp can be accessed via a standard web browser from the integrated search interface of the Czech National Corpus [[http:// |
| After signing a non-profit licence agreement, texts from InterCorp can also be acquired as bilingual files including shuffled pairs of sentences. Please contact us at the address below if you are interested. | After signing a non-profit licence agreement, texts from InterCorp can also be acquired as bilingual files including shuffled pairs of sentences. Please contact us at the address below if you are interested. | ||
| New release of InterCorp is usually published once per year. With each new release, its size, possibly also the number of languages and the extent and quality of annotation may grow. Previous versions remain available (starting with release 6). | New release of InterCorp is usually published once per year. With each new release, its size, possibly also the number of languages and the extent and quality of annotation may grow. Previous versions remain available (starting with release 6). | ||
| - | |||
| ===== References ===== | ===== References ===== | ||
| Line 70: | Line 69: | ||
| ^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Total ^ | ^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Total ^ | ||
| - | | ar | Arabic | 34 | 0 | 0 | 0 | 0 | 0 | 34 | | + | | ar | Arabic | 34 | 0 | 0 | 0 | 0 | 0 | 34 | |
| - | | be | Belarusian | 3 025 | 0 | 0 | 0 | 0 | 0 | 3 025 | | + | | be | Belarusian | 3,025 | 0 | 0 | 0 | 0 | 0 | 3,025 | |
| - | | bg | Bulgarian | 6 007 | 0 | 0 | 13 816 | 9 083 | 0 | 28 907 | | + | | bg | Bulgarian | 6,007 | 0 | 0 | 13,816 | 9,083 | 0 | 28,907 | |
| - | | ca | Catalan | 4 632 | 0 | 0 | 0 | 0 | 0 | 4 632 | | + | | ca | Catalan | 4,632 | 0 | 0 | 0 | 0 | 0 | 4,632 | |
| - | | da | Danish | 3 556 | 0 | 0 | 21 679 | 13 915 | 14 429 | 53 581 | | + | | da | Danish | 3,556 | 0 | 0 | 21,679 | 13,915 | 14,429 | 53,581 | |
| - | | de | German | 31 168 | 3 725 | 2 482 | 21 723 | 13 089 | 8 366 | 80 556 | | + | | de | German | 31,168 | 3,725 | 2,482 | 21,723 | 13,089 | 8,366 | 80,556 | |
| - | | el | Greek | 0 | 0 | 0 | 25 069 | 15 403 | 23 714 | 64 187 | | + | | el | Greek | 0 | 0 | 0 | 25,069 | 15,403 | 23,714 | 64,187 | |
| - | | en | English | 21 208 | 3 818 | 2 670 | 24 207 | 15 580 | 52 101 | 119 586 | | + | | en | English | 21,208 | 3,818 | 2,670 | 24,207 | 15,580 | 52,101 | 119,586 | |
| - | | es | Spanish | 19 310 | 4 324 | 2 816 | 27 001 | 15 885 | 36 378 | 105 716 | | + | | es | Spanish | 19,310 | 4,324 | 2,816 | 27,001 | 15,885 | 36,378 | 105,716 | |
| - | | et | Estonian | 0 | 0 | 0 | 15 962 | 10 899 | 10 296 | 37 158 | | + | | et | Estonian | 0 | 0 | 0 | 15,962 | 10,899 | 10,296 | 37,158 | |
| - | | fi | Finnish | 3 645 | 0 | 0 | 16 455 | 10 175 | 15 097 | 45 373 | | + | | fi | Finnish | 3,645 | 0 | 0 | 16,455 | 10,175 | 15,097 | 45,373 | |
| - | | fr | French | 12 406 | 4 393 | 2 928 | 27 351 | 17 178 | 25 961 | 90 219 | | + | | fr | French | 12,406 | 4,393 | 2,928 | 27,351 | 17,178 | 25,961 | 90,219 | |
| - | | he | Hebrew | 0 | 0 | 0 | 0 | 0 | 16 221 | 16 221 | | + | | he | Hebrew | 0 | 0 | 0 | 0 | 0 | 16,221 | 16,221 | |
| | hi | Hindu | 408 | 0 | 0 | 0 | 0 | 0 | 408 | | | hi | Hindu | 408 | 0 | 0 | 0 | 0 | 0 | 408 | | ||
| - | | hr | Croatian | 19 980 | 0 | 0 | 0 | 0 | 19 042 | 39 023 | | + | | hr | Croatian | 19,980 | 0 | 0 | 0 | 0 | 19,042 | 39 023 | |
| - | | hu | Hungarian | 5 818 | 0 | 0 | 19 176 | 12 306 | 21 239 | 58 541 | | + | | hu | Hungarian | 5,818 | 0 | 0 | 19,176 | 12,306 | 21,239 | 58,541 | |
| - | | is | Icelandic | 0 | 0 | 0 | 0 | 0 | 1 584 | 1 584 | | + | | is | Icelandic | 0 | 0 | 0 | 0 | 0 | 1,584 | 1,584 | |
| - | | it | Italian | 8 694 | 651 | 2 707 | 24 849 | 15 489 | 14 653 | 67 046 | | + | | it | Italian | 8,694 | 651 | 2,707 | 24,849 | 15,489 | 14,653 | 67,046 | |
| | ja | Japanese | 0 | 0 | 0 | 0 | 0 | 113 | 113 | | | ja | Japanese | 0 | 0 | 0 | 0 | 0 | 113 | 113 | | ||
| - | | lt | Lithuanian | 358 | 0 | 0 | 18 392 | 11 212 | 557 | 30 521 | | + | | lt | Lithuanian | 358 | 0 | 0 | 18,392 | 11,212 | 557 | 30,521 | |
| - | | lv | Latvian | 1 336 | 0 | 0 | | + | | lv | Latvian | 1,666 | 0 | 0 | |
| - | | mk | Macedonian | 4 663 | 0 | 0 | 0 | 0 | 1 877 | 6 540 | | + | | mk | Macedonian | 4,663 | 0 | 0 | 0 | 0 | 1,877 | 6,540 | |
| - | | ms | Malay | 0 | 0 | 0 | 0 | 0 | 3 520 | 3 520 | | + | | ms | Malay | 0 | 0 | 0 | 0 | 0 | 3,520 | 3,520 | |
| - | | mt | Maltese | 0 | 0 | 0 | 14 133 | 0 | 0 | 14 133 | | + | | mt | Maltese | 0 | 0 | 0 | 14,133 | 0 | 0 | 14,133 | |
| - | | nl | Dutch | 11 444 | 314 | 2 955 | 24 746 | 15 563 | 29 362 | 84 386 | | + | | nl | Dutch | 11,444 | 314 | 2,955 | 24,746 | 15,563 | 29,362 | 84,386 | |
| - | | no | Norwegian | 4 965 | 0 | 0 | 0 | 0 | 0 | 4 965 | | + | | no | Norwegian | 4,965 | 0 | 0 | 0 | 0 | 0 | 4,965 | |
| - | | pl | Polish | 21 433 | 0 | 2 378 | 20 627 | 12 811 | 26 572 | 83 822 | | + | | pl | Polish | 21,433 | 0 | 2,378 | 20,627 | 12, | 26,572 | 83,822 | |
| - | | pt | Portuguese | 2 605 | 369 | 2 999 | 28 602 | 16 484 | 43 391 | 94 454 | | + | | pt | Portuguese | 2,605 | 369 | 2,999 | 28,602 | 16,484 | 43,391 | 94,454 | |
| | rn | Romani | 5 | 0 | 0 | 0 | 0 | 0 | 5 | | | rn | Romani | 5 | 0 | 0 | 0 | 0 | 0 | 5 | | ||
| - | | ro | Romanian | 3 432 | 0 | 2 737 | 8 199 | 9 446 | 34 128 | 57 944 | | + | | ro | Romanian | 3,432 | 0 | 2,737 | 8,199 | 9,446 | 34,128 | 57,944 | |
| - | | ru | Russian | 4 788 | 3 174 | 0 | 0 | 0 | 6 885 | 14 848 | | + | | ru | Russian | 4,788 | 3,174 | 0 | 0 | 0 | 6,885 | 14,848 | |
| - | | sk | Slovak | 8 066 | 0 | 0 | 19 222 | 12 734 | 5 134 | 45 158 | | + | | sk | Slovak | 8,066 | 0 | 0 | 19,222 | 12,734 | 5,134 | 45,158 | |
| - | | sl | Slovenian | 2 057 | 0 | 0 | 19 645 | 12 240 | 17 024 | 50 968 | | + | | sl | Slovenian | 2,057 | 0 | 0 | 19,645 | 12,240 | 17,024 | 50,968 | |
| - | | sq | Albanian | 0 | 0 | 0 | 0 | 0 | 2 003 | 2 003 | | + | | sq | Albanian | 0 | 0 | 0 | 0 | 0 | 2,003 | 2,003 | |
| - | | sr | Serbian | 9 886 | 0 | 0 | 0 | 0 | 20 720 | 30 607 | | + | | sr | Serbian | 9,886 | 0 | 0 | 0 | 0 | 20,720 | 30,607 | |
| - | | sv | Swedish | 8 959 | 0 | 0 | 20 585 | 13 840 | 14 693 | 58 079 | | + | | sv | Swedish | 8,959 | 0 | 0 | 20,585 | 13,840 | 14,693 | 58,079 | |
| - | | tr | Turkish | 0 | 0 | 0 | 0 | 0 | 21 190 | 21 190 | | + | | tr | Turkish | 0 | 0 | 0 | 0 | 0 | 21,190 | 21,190 | |
| - | | uk | Ukrainian | 7 597 | 0 | 0 | 0 | 0 | 246 | 7 843 | | + | | uk | Ukrainian | 7,597 | 0 | 0 | 0 | 0 | 246 | 7,843 | |
| - | | vi | Vietnamese | 0 | 0 | 0 | 0 | 0 | 1 473 | 1 473 | | + | | vi | Vietnamese | 0 | 0 | 0 | 0 | 0 | 1,473 | 1,473 | |
| - | | **Subtotal** | | 231 501 | 20 769 | 24 676 | 430 160 | 265 022 | 488 266 | 1 460 397 | | + | | **Subtotal** | | 231,501 | 20,769 | 24,676 | 430,160 | 265,022 | 488,266 | 1,460,397 | |
| - | | cs | Czech | 96 956 | 3 416 | 2 315 | 20 303 | 12 922 | 50 688 | 186 602 | | + | | cs | Czech | 96,956 | 3,416 | 2,315 | 20,303 | 12,922 | 50,688 | 186,602 | |
| - | | **TOTAL** | | 328 458 | 24 186 | 26 991 | 450 463 | 277 945 | 538 954 | 1 647 000 | | + | | **TOTAL** | | 328,458 | 24,186 | 26,991 | 450,463 | 277,945 | 538,954 | 1,647,000 | |
| N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart. | N.B.: Each Czech text is counted only once, even though it may have more than one foreign counterpart. | ||
| Line 123: | Line 122: | ||
| ^ Croatian | ✔ | ✔ | | ^ Croatian | ✔ | ✔ | | ||
| ^ Czech | ✔ | ✔ | [[http:// | ^ Czech | ✔ | ✔ | [[http:// | ||
| - | ^ Dutch | ✔ | | + | ^ Dutch | ✔ | |
| ^ English | ✔ | ^ English | ✔ | ||
| ^ Estonian | ✔ | ✔ | [[http:// | ^ Estonian | ✔ | ✔ | [[http:// | ||
| - | ^ Finnish | ✔ | ✔ | | + | ^ Finnish | ✔ | ✔ | |
| ^ French | ✔ | ✔ | [[http:// | ^ French | ✔ | ✔ | [[http:// | ||
| - | ^ German | ✔ | ✔ | [[http:// | + | ^ German | ✔ | ✔ | [[http:// |
| ^ Hungarian | ✔ | | ^ Hungarian | ✔ | | ||
| ^ Icelandic | ✔ | ✔ | [[http:// | ^ Icelandic | ✔ | ✔ | [[http:// | ||
| Line 143: | Line 142: | ||
| ^ Spanish | ✔ | ✔ | [[ftp:// | ^ Spanish | ✔ | ✔ | [[ftp:// | ||
| ^ Swedish | ✔ | ✔ | [[http:// | ^ Swedish | ✔ | ✔ | [[http:// | ||
| + | |||
| Queries including contracted forms into tagged or lemmatized texts may fail. This includes forms such as // | Queries including contracted forms into tagged or lemmatized texts may fail. This includes forms such as // | ||
| Morphological tags including characters with a special meaning in regular expressions, | Morphological tags including characters with a special meaning in regular expressions, | ||
| - | |||
| - | |||
| ====Structural attributes==== | ====Structural attributes==== | ||
| Line 195: | Line 193: | ||
| ==== Pre-processing ==== | ==== Pre-processing ==== | ||
| - | * parallel | + | * Parallel |
| * Aligner [[http:// | * Aligner [[http:// | ||
| * Sentence splitter for Czech by Pavel Květoň | * Sentence splitter for Czech by Pavel Květoň | ||
| Line 214: | Line 212: | ||
| * [[https:// | * [[https:// | ||
| * [[http:// | * [[http:// | ||
| + | * [[https:// | ||
| + | * [[https:// | ||