AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
en:cnk:romcro [2026/05/25 17:22] michalkrenen:cnk:romcro [2026/05/25 17:22] (current) – [RomCro 2.0 - Parallel corpus of Romance languages ​​and Croatian] michalkren
Line 1: Line 1:
 ====== RomCro 2.0 - Parallel corpus of Romance languages ​​and Croatian ====== ====== RomCro 2.0 - Parallel corpus of Romance languages ​​and Croatian ======
  
-The project Parallel Corpus in Romance Languages and Croatian (RomCro) started in 2019 at the Chair of Romance Linguistics of the Department of Romance Studies of the Faculty of Humanities and Social Sciences, University of Zagreb. The corpus unites five Romance languages ​​(French, Portuguese, Romanian, Italian, Spanish and, recently, Catalan) and, with the addition of Croatian, makes a contribution to the existing linguistic resources for the Croatian language. It consists of literary texts from the 20<sup>th</sup> and 21<sup>st</sup> centuries, with each original-language text accompanied by translations into other languages. The segmentation and alignment were done automatically and checked manually. For copyright reasons, the **sentences are scrambled randomly** in the corpus.+The project Parallel Corpus in Romance Languages and Croatian (RomCro) started in 2019 at the Chair of Romance Linguistics of the Department of Romance Studies of the Faculty of Humanities and Social Sciences, University of Zagreb. The corpus unites five Romance languages ​​(French, Portuguese, Romanian, Italian, Spanish and, recently, Catalan) and, with the addition of Croatian, makes a contribution to the existing linguistic resources for the Croatian language. It consists of literary texts from the 20<sup>th</sup> and 21<sup>st</sup> centuries, with each original-language text accompanied by translations into other languages. The segmentation and alignment were done automatically and checked manually. For copyright reasons, **the sentences are scrambled randomly in the corpus**.
  
 The RomCro corpus was created with the support of the Faculty of Humanities and Social Sciences, University of Zagreb from 2019 to 2025. The new version was also developed as part of a project supported by the Croatian Science Foundation and funded by the European Union – NextGenerationEU (project number: MOBODL 2023 08 9511). The new version of the corpus includes three new titles in Portuguese and Croatian. Furthermore, the sixth Romance language, Catalan, has been added by integrating existing Catalan translations and incorporating three Catalan novels with translations into the other languages. Compared to the first version of the corpus (not available in CNC; cf. also Table 1), RomCro v.2.0 includes 54 new texts, 24,200 more translation units, and 3.7 million more words, for a total of 19.4 million words. The RomCro corpus was created with the support of the Faculty of Humanities and Social Sciences, University of Zagreb from 2019 to 2025. The new version was also developed as part of a project supported by the Croatian Science Foundation and funded by the European Union – NextGenerationEU (project number: MOBODL 2023 08 9511). The new version of the corpus includes three new titles in Portuguese and Croatian. Furthermore, the sixth Romance language, Catalan, has been added by integrating existing Catalan translations and incorporating three Catalan novels with translations into the other languages. Compared to the first version of the corpus (not available in CNC; cf. also Table 1), RomCro v.2.0 includes 54 new texts, 24,200 more translation units, and 3.7 million more words, for a total of 19.4 million words.