Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:cnk:intercorp:verze16ud [2024/10/11 11:13] – [Corpus size by language] alexandrrosen | en:cnk:intercorp:verze16ud [2025/01/10 21:04] (current) – [InterCorp Release 16ud – Universal Dependencies] alexandrrosen |
---|
^ ::: ^ publication date | 2024 ^^^^ | ^ ::: ^ publication date | 2024 ^^^^ |
^ ::: ^ foreign languages | 61 ^^^^ | ^ ::: ^ foreign languages | 61 ^^^^ |
^ ::: ^ tagged languages | 47 ^^^^ | ^ ::: ^ tagged languages | 48 ^^^^ |
^ ::: ^ lemmatized languages | 47 ^^^^ | ^ ::: ^ lemmatized languages | 48 ^^^^ |
^ ::: ^ syntactically annotated languages| 47 ^^^^ | ^ ::: ^ syntactically annotated languages| 48 ^^^^ |
| |
===== Access to the texts ===== | ===== Access to the texts ===== |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 107| 147 063| 46 510.1| 280 566.2| 355 121.8| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=pt|pt]]| 107| 147 063| 46 510.1| 280 566.2| 355 121.8| |
^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 2| 2| 1.7| 13.6| 17.7| | ^[[https://en.wikipedia.org/wiki/Romani_language|rn]]| 2| 2| 1.7| 13.6| 17.7| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ru|ru]]| 55| 102 904| 39 561.2| 235 702.3| 295 301.3| | |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 184| 32 839| 22 985.2| 122 130.4| 163 120.7| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ro|ro]]| 184| 32 839| 22 985.2| 122 130.4| 163 120.7| |
| ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=ru|ru]]| 55| 102 904| 39 561.2| 235 702.3| 295 301.3| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| 1| 499| 522.5| 2 313.4| 3 021.8| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=si|si]]| 1| 499| 522.5| 2 313.4| 3 021.8| |
^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sk|sk]]| 170| 94 585| 10 080.0| 74 862.7| 95 881.0| | ^[[https://www.loc.gov/standards/iso639-2/php/langcodes_name.php?iso_639_1=sk|sk]]| 170| 94 585| 10 080.0| 74 862.7| 95 881.0| |
| |
===== References – about UD-annotated InterCorp ===== | ===== References – about UD-annotated InterCorp ===== |
| |
| Rosen, A. (2024): Lexical and syntactic variability |
| of languages and text genres – a corpus-based study. [[https://www.youtube.com/watch?v=E2ujmqt7Q2E|Recording]] from 14 October 2024: [[https://zil.ipipan.waw.pl/|Natural Language Processing Seminar]] organised by the [[https://zil.ipipan.waw.pl|Linguistic Engineering Group]] at the [[https://ipipan.waw.pl|Institute of Computer Science]] [[https://pan.pl|Polish Academy of Sciences]], [[https://zil.ipipan.waw.pl/seminarium-archiwum?action=AttachFile&do=view&target=2024-10-14.pdf|slides]]. |
| |
Olga Nádvorníková (2024): Analyse contrastive de la complexité syntaxique à l’aide de corpus parallèles. Translitteræ, Laboratoire LATTICE (Langues, Textes, Traitements informatiques et Cognition) – CNRS UMR 8094 (Centre national de la recherche scientifique: Unité mixte de recherche), ENS (L'École normale supérieure). Paris, 28/05/2024. [[https://www.youtube.com/watch?v=wJrCez_XPQY|Video]], [[https://jakobson.korpus.cz/~rosen/INTERCORP/SLIDES/C4%20Nadvornikova%20Analyse%20contrastiv%20e%20de%20la%20complexité%20syntaxique.pdf|slides]] | Olga Nádvorníková (2024): Analyse contrastive de la complexité syntaxique à l’aide de corpus parallèles. Translitteræ, Laboratoire LATTICE (Langues, Textes, Traitements informatiques et Cognition) – CNRS UMR 8094 (Centre national de la recherche scientifique: Unité mixte de recherche), ENS (L'École normale supérieure). Paris, 28/05/2024. [[https://www.youtube.com/watch?v=wJrCez_XPQY|Video]], [[https://jakobson.korpus.cz/~rosen/INTERCORP/SLIDES/C4%20Nadvornikova%20Analyse%20contrastiv%20e%20de%20la%20complexité%20syntaxique.pdf|slides]] |