AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:romcro [2026/05/19 10:50] – old revision restored (2026/05/19 09:35) jankoceken:cnk:romcro [2026/05/22 17:09] (current) – [How to cite RomCro] michalkren
Line 1: Line 1:
-**RomCro 2.0 - Parallel corpus of Romance languages ​​and Croatian**+====== RomCro 2.0 - Parallel corpus of Romance languages ​​and Croatian ======
  
 The project Parallel Corpus in Romance Languages and Croatian (RomCro) started in 2019 at the Chair of Romance Linguistics of the Department of Romance Studies of the Faculty of Humanities and Social Sciences, University of Zagreb. The corpus unites five Romance languages ​​(French, Portuguese, Romanian, Italian, Spanish and, recently, Catalan) and, with the addition of Croatian, makes a contribution to the existing linguistic resources for the Croatian language. It consists of literary texts from the 20<sup>th</sup> and 21<sup>st</sup> centuries, with each original-language text accompanied by translations into other languages. The project Parallel Corpus in Romance Languages and Croatian (RomCro) started in 2019 at the Chair of Romance Linguistics of the Department of Romance Studies of the Faculty of Humanities and Social Sciences, University of Zagreb. The corpus unites five Romance languages ​​(French, Portuguese, Romanian, Italian, Spanish and, recently, Catalan) and, with the addition of Croatian, makes a contribution to the existing linguistic resources for the Croatian language. It consists of literary texts from the 20<sup>th</sup> and 21<sup>st</sup> centuries, with each original-language text accompanied by translations into other languages.
  
-The RomCro corpus was created with the support of the Faculty of Humanities and Social Sciences, University of Zagreb from 2019 to 2025. The new version was also developed as part of a project supported by the Croatian Science Foundation and funded by the European Union – NextGenerationEU (project number: MOBODL 2023 08 9511). The new version of the corpus includes three new titles in Portuguese and Croatian. Furthermore, the sixth Romance language, Catalan, has been added by integrating existing Catalan translations and incorporating three Catalan novels with translations into the other languages. Compared to the first version of the corpus (see Table 1), RomCro v.2.0 includes 54 new texts, 24,200 more translation units, and 3.7 million more words, for a total of 19.4 million words.+The RomCro corpus was created with the support of the Faculty of Humanities and Social Sciences, University of Zagreb from 2019 to 2025. The new version was also developed as part of a project supported by the Croatian Science Foundation and funded by the European Union – NextGenerationEU (project number: MOBODL 2023 08 9511). The new version of the corpus includes three new titles in Portuguese and Croatian. Furthermore, the sixth Romance language, Catalan, has been added by integrating existing Catalan translations and incorporating three Catalan novels with translations into the other languages. Compared to the first version of the corpus (not available in CNC; cf. also Table 1), RomCro v.2.0 includes 54 new texts, 24,200 more translation units, and 3.7 million more words, for a total of 19.4 million words.
  
 ^                               ^**RomCro v.1.0**^**RomCro v.2.0**^**Difference**^ ^                               ^**RomCro v.1.0**^**RomCro v.2.0**^**Difference**^
-|**Languages**                  |6               |7               |**1**         | +|**Languages**                  |    **7**               |  1         | 
-|**Translation units**          |142,470         |166,742         |**24,272**    | +|**Translation units**          |  142,470          **166,742**         |  24,272    | 
-|**Originals**                  |27              |33              |**6**         | +|**Originals**                  |  27              |  **33**               6         | 
-|**Texts total**                |159             |213             |**54**        | +|**Texts total**                |  159              **213**             |  54        | 
-|**Size (in millions of words)**|15.7            |19.4            |**3.7**       |+|**Size (in millions of words)**|  15.7            |  **19.4**            |  3.7       |
  
 Table 1. Comparison between the two versions Table 1. Comparison between the two versions
Line 16: Line 16:
 RomCro was by [[https://ufal.mff.cuni.cz/udpipe|UDPipe]] annotated according to the [[https://universaldependencies.org/|Universal Dependencies]] (UD) standard, which means that it is not only lemmatised and morphologically tagged, but its annotation includes also syntax. RomCro is made available via the [[http://kontext.korpus.cz/|KonText]] user query interface in a way which follows UD versions of the [[https://wiki.korpus.cz/doku.php/en:cnk:intercorp|InterCorp]] parallel corpus. RomCro was by [[https://ufal.mff.cuni.cz/udpipe|UDPipe]] annotated according to the [[https://universaldependencies.org/|Universal Dependencies]] (UD) standard, which means that it is not only lemmatised and morphologically tagged, but its annotation includes also syntax. RomCro is made available via the [[http://kontext.korpus.cz/|KonText]] user query interface in a way which follows UD versions of the [[https://wiki.korpus.cz/doku.php/en:cnk:intercorp|InterCorp]] parallel corpus.
  
-How to cite RomCro:+===== How to cite RomCro =====
  
-Bikić-Carić, G.Mikelenić, B. Bezlaj, M. (2023). Construcción del RomCro, un corpus paralelo multilingüe. //Procesamiento del Lenguaje Natural//, 70. Sociedad Española para el Procesamiento del Lenguaje Natural, 99-110.+Bikić-Carić, G. – Mikelenić, B. – Bezlaj, M. (2023). Construcción del RomCro, un corpus paralelo multilingüe. //Procesamiento del Lenguaje Natural//, 70. Sociedad Española para el Procesamiento del Lenguaje Natural, 99110.
  
-Mikelenić, B.Bikić-Carić, G.Bezlaj, M.Oliver, A. Tadić, M. (2025). //RomCro v.2.0 - Parallel corpus of Romance languages ​​and Croatian//HR-CLARIN, http://hdl.handle.net/20.500.14615/2-16+Mikelenić, B. – Bikić-Carić, G. – Bezlaj, M. – Oliver, A. – Tadić, M. (2025). //RomCro v.2.0 - Parallel corpus of Romance languages ​​and Croatian//HR-CLARIN, http://hdl.handle.net/20.500.14615/2-16
  
 * The 2023 paper describes the building of RomCro v.1.0, while the 2025 repository entry refers to RomCro v.2.0 in the HR-CLARIN repository. Please cite both sources when referring to the corpus. * The 2023 paper describes the building of RomCro v.1.0, while the 2025 repository entry refers to RomCro v.2.0 in the HR-CLARIN repository. Please cite both sources when referring to the corpus.