AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:intercorp:verze16ud [2025/01/10 20:52] – [Corpus size by language] alexandrrosenen:cnk:intercorp:verze16ud [2026/04/07 22:45] (current) – [Main features of release 16ud] alexandrrosen
Line 11: Line 11:
 ^ ::: ^ publication date |  2024  ^^^^ ^ ::: ^ publication date |  2024  ^^^^
 ^ ::: ^ foreign languages |  61  ^^^^ ^ ::: ^ foreign languages |  61  ^^^^
-^ ::: ^ tagged languages |  49  ^^^^ +^ ::: ^ tagged languages |  48  ^^^^ 
-^ ::: ^ lemmatized languages |  49  ^^^^ +^ ::: ^ lemmatized languages |  48  ^^^^ 
-^ ::: ^ syntactically annotated languages|  49  ^^^^+^ ::: ^ syntactically annotated languages|  48  ^^^^
  
 ===== Access to the texts ===== ===== Access to the texts =====
Line 30: Line 30:
   * After 13ud, 16ud is the second release of InterCorp featuring linguistic annotation according to the [[en:pojmy:ud|Universal Dependencies]] scheme.   * After 13ud, 16ud is the second release of InterCorp featuring linguistic annotation according to the [[en:pojmy:ud|Universal Dependencies]] scheme.
   * Release 16ud is the first CNC corpus to feature the metrics of <fs large>**[[en:pojmy:syntakticka_komplexita|syntactic complexity]]**</fs> and <fs large>**[[en:pojmy:lexikalni_bohatost|lexical diversity]]**</fs>.((We are grateful to Olga Nádvorníková, whose initiative and guidance made the addition of syntactic complexity and lexical diversity measures to the corpus metadata possible. We also thank Jiří Milička for his valuable advice on selecting appropriate lexical diversity measures.))   * Release 16ud is the first CNC corpus to feature the metrics of <fs large>**[[en:pojmy:syntakticka_komplexita|syntactic complexity]]**</fs> and <fs large>**[[en:pojmy:lexikalni_bohatost|lexical diversity]]**</fs>.((We are grateful to Olga Nádvorníková, whose initiative and guidance made the addition of syntactic complexity and lexical diversity measures to the corpus metadata possible. We also thank Jiří Milička for his valuable advice on selecting appropriate lexical diversity measures.))
-  * In release 16ud, out of the total number of 62 languages ​​(including Czech), **47 are linguistically annotated**; in addition, all such languages ​​are **syntactically annotated**.+  * In release 16ud, out of the total number of 62 languages ​​(including Czech), **48 are linguistically annotated**; in addition, all such languages ​​are **syntactically annotated**.
   * Texts are **annotated in the same way** in all languages, according to the UD standard ([[https://universaldependencies.org|Universal Dependencies]]).   * Texts are **annotated in the same way** in all languages, according to the UD standard ([[https://universaldependencies.org|Universal Dependencies]]).
   * Annotation was performed for all languages ​​by [[https://ufal.mff.cuni.cz/udpipe|UDPipe]], based on the data created in the UD project.((Annotation of this release used the following models: afrikaans-afribooms-ud-2.12-230717,    * Annotation was performed for all languages ​​by [[https://ufal.mff.cuni.cz/udpipe|UDPipe]], based on the data created in the UD project.((Annotation of this release used the following models: afrikaans-afribooms-ud-2.12-230717, 
Line 88: Line 88:
  
   * Political commentaries published by [[http://www.project-syndicate.org/|Project Syndicate]] (below referred to as **Syndicate**) and [[http://www.voxeurop.eu|VoxEurop]] (formerly **PressEurop**)   * Political commentaries published by [[http://www.project-syndicate.org/|Project Syndicate]] (below referred to as **Syndicate**) and [[http://www.voxeurop.eu|VoxEurop]] (formerly **PressEurop**)
-  * A package of legal texts of the European Union form the [[https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis|Acquis Communautaire]] corpus (**Acquis**)+  * A colection of legal texts of the European Union form the [[https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis|Acquis Communautaire]] corpus (**Acquis**)
   * Proceedings of the European Parliament dated 2007–2011 from the [[http://www.statmt.org/europarl/|Europarl]] corpus (**Europarl**)   * Proceedings of the European Parliament dated 2007–2011 from the [[http://www.statmt.org/europarl/|Europarl]] corpus (**Europarl**)
   * Film subtitles from the [[http://www.opensubtitles.org/|Open Subtitles]] database (**Subtitles**)   * Film subtitles from the [[http://www.opensubtitles.org/|Open Subtitles]] database (**Subtitles**)