====== InterCorp ====== **InterCorp** is a large parallel synchronic corpus covering a number of languages. The corpus is compiled mostly by teachers and students of the Faculty of Arts, Charles University in Prague, and by other collaborators of the ICNC. It serves as a source of data for theoretical studies, lexicography, student research, (foreign) language learning, computer applications, translators and also for the general public. All texts in InterCorp and all features of the search interface are available after free [[https://www.korpus.cz/toolbar/signup.php|registration]] and login via [[en:manualy:kontext:index|KonText]] or [[en:manualy:treq|Treq]] interface. The registration is identical for all public ICNC corpora. No special registration for InterCorp is required if you already have user login and password for the Czech part of InterCorp. InterCorp is a part of the Czech National Corpus, a project funded by the Ministry of Education of the Czech Republic within the programme Large Research, Development and Innovation Infrastructures (LM2018137; 2020–22). In 2016-2019, 2012-2015 and 2005-2011 the project was supported from the same source (projects no. LM2015044, LM2011023 and 0021620823, respectively). The entire project is academic and non-commercial. ===== Description ===== Starting with Release 6, InterCorp can be seen as referential: all its previous releases stay available in their originally published form. The volume of texts, the number of languages and the extent of annotation (lemmatization and tagging) may grow with each new release and the introduction of new tools. For more details about the individual releases of InterCorp see the overview below: ^ Release ^ Publication year ^ Number of words in millions((Total number of words in foreign texts)) ^ Number of foreign languages ^ Tagged / lemmatized ^ List of changes ^ ^ [[en:cnk:intercorp:verze16|InterCorp Release 16]]| 2023 | 4 893,0 | 61 | 27 / 25 | [[en:cnk:intercorp:historie#verze 16|Release 16]] | ^ [[en:cnk:intercorp:verze15|InterCorp Release 15]]| 2022 | 1 588.2 | 41 | 27 / 25 | [[en:cnk:intercorp:historie#verze 15|Release 15]] | ^ [[en:cnk:intercorp:verze14|InterCorp Release 14]]| 2022 | 1 572.0 | 41 | 27 / 25 | [[en:cnk:intercorp:historie#verze 14|Release 14]] | ^ [[en:cnk:intercorp:verze13ud|InterCorp Release 13ud]]| 2021 | 1 551.2 | 40 | 35 / 35 | [[en:cnk:intercorp:historie#verze 13ud|Release 13ud]] | ^ [[en:cnk:intercorp:verze13|InterCorp Release 13]]| 2020 | 1 551.2 | 40 | 27 / 25 | [[en:cnk:intercorp:historie#verze 13|Release 13]] | ^ [[en:cnk:intercorp:verze12|InterCorp Release 12]]| 2019 | 1 533.7 | 40 | 27 / 25 | [[en:cnk:intercorp:historie#verze 12|Release 12]] | ^ [[en:cnk:intercorp:verze11|InterCorp Release 11]]| 2018 | 1 508.4 | 39 | 26 / 25 | [[en:cnk:intercorp:historie#verze 11|Release 11]] | ^ [[en:cnk:intercorp:verze10|InterCorp Release 10]]| 2017 | 1,483.8 | 39 | 23 / 22 | [[en:cnk:intercorp:historie#verze 10|Release 10]] | ^ [[en:cnk:intercorp:verze9|InterCorp Release 9]]| 2016 | 1,460.0 | 39 | 23 / 20 | [[en:cnk:intercorp:historie#verze 9|Release 9]] | ^ [[en:cnk:intercorp:verze8|InterCorp Release 8]]| 2015 | 1,423.0 | 38 | 20 / 17 | [[en:cnk:intercorp:historie#verze 8|Release 8]] | ^ [[en:cnk:intercorp:verze7|InterCorp Release 7]]| 2014 | 1,390.0 | 38 | 20 / 17 | [[en:cnk:intercorp:historie#verze 7|Release 7]] | ^ [[en:cnk:intercorp:verze6|InterCorp Release 6]]| 2013 | 867.3 | 31 | 17 / 14 | [[en:cnk:intercorp:historie#verze 6|Release 6]] | ^ [[en:cnk:intercorp:verze5|InterCorp Release 5]]| 2012 | 542.6 | 27 | 17 / 14 | [[en:cnk:intercorp:historie#verze 5|Release 5]] | ^ [[en:cnk:intercorp:verze4|InterCorp Release 4]]| 2011 | 92.3 | 22 | 13 / 10 | [[en:cnk:intercorp:historie#verze 4|Release 4]] | ^ [[en:cnk:intercorp:verze3|InterCorp Release 3]]| 2011 | 72.3 | 22 | 13 / 10 | [[en:cnk:intercorp:historie#verze 3|Release 3]] | ^ InterCorp Release 2| 2009 | 49.3 | 21 | 10 / 7 | [[en:cnk:intercorp:historie#verze 2|Release 2]] | ^ InterCorp Release 1| 2009 | 34.5| 20 | 10 / 7 | [[en:cnk:intercorp:historie#verze 1|Release 1]] | ^ InterCorp Release 0| 2008 | 25.0 | 19 | 0 / 0 | [[en:cnk:intercorp:historie#verze 0|Release 0]] | The corpus consists of two parts: //core// and //collections//. The **core** of InterCorp consists mostly of fiction with manually checked alignments. **Collections** are texts acquired in multiple languages, processed and aligned automatically: concordances may include more misaligned segments. Moreover, collection do not always include all texts from the original source, such as texts without a Czech counterpart. Some texts from the Acquis Communautaire and Europarl corpora have been partially corrected or omitted – as a result, they may differ in form or size if compared with the original source. A similar selection was applied to the Open Subtitles database, where – as an additional reduction – only a single translation was selected per title and language. On the other hand, some metadata items missing in the original resource but detectable from context or other sources have been added. Each text has a Czech counterpart. As a result, Czech is the pivot language: for every text there is a single Czech version (original or translation), aligned with one or more foreign-language versions. InterCorp can be accessed via a standard web browser from the integrated search interface of the Czech National Corpus [[en:manualy:kontext:index|KonText]] (previously also via [[pojmy:korpusovy_manazer#nosketch_engine|NoSketch Engine]] and [[pojmy:korpusovy_manazer#park|Park]]). There is a [[kurz:uvod|Czech tutorial]] on Kontext. [{{:en:cnk:vyhledavani_ic_en.png?700|Specifying a parallel query}}] \\ [{{:cnk:intercorp1love.png?700|Result of a query for substrings //lieb// and //lov//}}] ===== Contacts ===== ==== Project coordination, technical support and web pages administration ==== [[http://utkl.ff.cuni.cz/~rosen|Alexandr Rosen]]\\ [[http://utkl.ff.cuni.cz/|Institute of Theoretical and Computational Linguistics]]\\ email: alexandr.rosen(at mark)ff.cuni.cz ==== Discussion group ==== intercorp(at mark)ff.cuni.cz - group address, please use when appropriate ===== Participants ===== ==== Coordinators for specific languages ==== | {{:cnk:vlajka-velka-ar.gif?direct&18}} | **Arabic** \\ [[http://enlil.ff.cuni.cz/node/159|Doc. PhDr. Petr Zemánek CSc.]]\\ [[http://enlil.ff.cuni.cz/|Institute of Comparative Linguistics]] \\ [[http://milicka.cz|PhDr. Jiří Milička, Ph.D.]]\\ [[http://ucnk.ff.cuni.cz/|Institute of the Czech National Corpus]] | | {{:cnk:vlajka-velka-by.gif?direct&18}} | **Belarusian** \\ [[http://www.bialkovich.cz|PhDr. Veranika Bialkovich]] | | {{:cnk:vlajka-velka-bg.gif?direct&18}} | **Bulgarian** \\ [[http://kjbs.ff.cuni.cz/cs/ustavkatedra/vyucujici/bulharistika/prof-phdr-hana-gladkova-csc/|Prof. PhDr. Hana Gladkova, CSc.]]\\ [[http://kjbs.ff.cuni.cz/en/|Department of South Slavonic and Balkan Studies]] \\ Mgr. Natalie Kalajdžievová Ph.D.\\ [[http://kjbs.ff.cuni.cz/|Department of South Slavonic and Balkan Studies]] | | {{:cnk:vlajka-velka-ca.gif?direct&18}} | **Catalan** \\ Mgr. Andreu Bauçà i Sastre, Ph.D.\\ [[http://www.carlemany.cz/|Centre Carlemany de Llengua Catalana, Department of Romance Studies]], \\ [[http://www.ensenyamentsuperior.ad/|Ensenyament Superior, Recerca i Ajuts a l’Estudi, Govern d'Andorra]] | | {{:cnk:vlajka-velka-zh.png?direct&18}} | **Chinese** \\ [[http://www.kas.upol.cz/katedra/clenove_katedry/Dobecka_Vlastimil.html|Mgr. Vlastimil Dobečka]] \\ [[http://www.kas.upol.cz/en/katedra/o_katedre.html|Department of Asian Studies, Faculty of Arts, Palacký University, Olomouc]] | | {{:cnk:vlajka-velka-hr.gif?direct&18}} | **Croatian** \\ [[http://kjbs.ff.cuni.cz/?q=node/176|Mgr. Karel Jirásek, Ph.D.]] \\ [[http://kjbs.ff.cuni.cz/|Department of South Slavonic and Balkan Studies]] | | {{:cnk:vlajka-velka-da.gif?direct&18}} | **Danish** \\ Mgr. Jana Pavlisová \\ Mgr. Kateřina Haušildová\\ [[http://german.ff.cuni.cz/en/|Department of Germanic Studies]] | | {{:cnk:vlajka-velka-nl.gif?direct&18}} | **Dutch** \\ Mgr. Eliška Boková \\ PhDr. Zdenka Hrnčířová\\ [[http://german.ff.cuni.cz/en/|Department of Germanic Studies]] | | {{:cnk:vlajka-velka-en.gif?direct&18}} | **English** \\ [[https://uajd.ff.cuni.cz/en/department/people/denisa-sebestova// |Mgr. Denisa Šebestová]]\\ [[https://uajd.ff.cuni.cz/en/|Department of English Language and ELT Methodology]]\\ [[https://ling.ff.cuni.cz/en/marketa_mala-2/ |doc. PhDr. Markéta Malá, Ph.D.]]\\ [[http://ulug.ff.cuni.cz/|Department of Linguistics]]\\ [[http://www.anglistika.upol.cz/katedra/clenove_katedry/Kubanek_Michal.html|Mgr. Michal Kubánek]]\\ [[http://www.anglistika.upol.cz/katedra/o_katedre.html|Department of English and American Studies, Faculty of Arts, Palacký University Olomouc]] \\ | | {{:cnk:vlajka-velka-fi.gif?direct&18}} | **Finnish** \\ [[http://fin.ff.cuni.cz/cs/ustavkatedra/lenka-farova/|Mgr. Lenka Fárová, Ph.D.]] \\ [[http://germanic.ff.cuni.cz/en/|Department of Germanic Studies]] | | {{:cnk:vlajka-velka-fr.gif?direct&18}} | **French** \\ [[http://urs.ff.cuni.cz/vyucujici/francouzstina/olga-nadvornikova/|PhDr. Olga Nádvorníková Ph.D.]]\\ [[http://urs.ff.cuni.cz/en/|Department of Romance Studies]] | | {{:cnk:vlajka-velka-de.gif?direct&18}} | **German** \\ [[http://german.ff.cuni.cz/cs/vyucujici/stepan-zbytovsky/|Mgr. Štěpán Zbytovský, Ph.D.]]\\ [[http://german.ff.cuni.cz/|Department of Germanic Studies]] \\ [[http://is.muni.cz/lide/index.pl?uco=363|Mgr. Tomáš Káňa, Ph.D.]]\\ [[http://www.ped.muni.cz/wger/|Department of German Language and Literature, Faculty of Education, Masaryk University, Brno]] \\ [[http://is.muni.cz/lide/index.pl?uco=537|PhDr. Hana Peloušková, Ph.D.]]\\ [[http://www.ped.muni.cz/wger/|Department of German Language and Literature, Faculty of Education, Masaryk University, Brno]]\\ PhDr. Vít Dovalil, Ph.D.\\ [[http://german.ff.cuni.cz/|Department of Germanic Studies]] | | {{:cnk:vlajka-velka-in.gif?direct&18}} | **Hindi** \\ [[http://ujca.ff.cuni.cz/UJCA-351.html|Mgr. Nora Melnikova, Ph.D.]] \\ [[http://ujca.ff.cuni.cz/UJCA-57.html|Institute of South and Central Asia]]\\ Bc. Vojtěch Diatka\\ [[http://ulug.ff.cuni.cz/|Department of Linguistics]] | | {{:cnk:vlajka-velka-hu.gif?direct&18}} | **Hungarian** \\ [[http://kses.ff.cuni.cz/cs/katedra/vyucujici/simona-kolmanova/|Mgr. Simona Kolmanová, Ph.D.]]\\ [[http://kses.ff.cuni.cz/en/|Department of Central European Studies]] | | {{:cnk:vlajka-velka-it.gif?direct&18}} | **Italian** \\ [[http://www.pavel-stichauer.cz/|doc. Pavel Štichauer, Ph.D.]]\\ [[http://urs.ff.cuni.cz/en/|Department of Romance Studies]] | | {{cnk:vlajka-velka-ja.png?direct&18}} | **Japanese** \\ [[https://udlv.ff.cuni.cz/en/ieas/structure-and-staff/petra-kasanugi/|Mgr. Petra Kanasugi, Ph.D.]]\\ [[http://udlv.ff.cuni.cz/en/|Institute of East Asian Studies]] | | {{:cnk:vlajka-velka-latvia.gif?direct&18}} | **Latvian** \\ [[https://ucnk.ff.cuni.cz/cs/ustav/lide/michal-skrabal/|Mgr. Michal Škrabal, Ph.D.]] \\ [[http://ucnk.ff.cuni.cz/|Institute of the Czech National Corpus]]\\ Mgr. Marija Lazar | | {{:cnk:vlajka-velka-lt.gif?direct&18}} | **Lithuanian** \\ Mgr. Věra Kociánová \\ RNDr. Hana Skoumalová, Ph.D. | | {{:cnk:vlajka-velka-mk.gif?direct&18}} | **Macedonian** \\ PhDr. Michala Adamová\\ [[http://ucnk.ff.cuni.cz/|Institute of the Czech National Corpus]] \\ Mgr. Vojkan Milenković | | {{:cnk:vlajka-velka-no.gif?direct&18}} | **Norwegian** \\ [[http://ucnk.ff.cuni.cz/pavo.html|Mgr. Pavel Vondřička Ph.D.]]\\ [[http://ucnk.ff.cuni.cz/|Institute of the Czech National Corpus]] | | {{:cnk:vlajka-velka-pl.gif?direct&18}} | **Polish** \\ Mgr. Łucja Bańczyk \\ Dr. Renata Dybalska\\ [[http://kses.ff.cuni.cz/en/|Department of Central European Studies]] | | {{:cnk:vlajka-velka-pt.gif?direct&18}} | **Portuguese** \\ [[http://romanistika.ff.cuni.cz/pt/Jindrova.html|PhDr. Jaroslava Jindrová Ph.D.]]\\ [[http://urs.ff.cuni.cz/en/|Department of Romance Studies]] | | {{:cnk:vlajka-velka-rn.png?direct&18}} | **Romani** \\ Ruben Pellar, Master of Arts, Ph.D. | | {{:cnk:vlajka-velka-ro.gif?direct&18}} | **Romanian** \\ Ing. Alexandr Krestovský\\ Univerzita Karlova v Praze CERGE | | {{:cnk:vlajka-velka-ru.gif?direct&18}} | **Russian** \\ PhDr. Natálie Rajnochová, Ph.D.\\ [[http://uves.ff.cuni.cz/en/|Department of East European Studies]] \\ Mgr. Naděžda Runštuková | | {{:cnk:vlajka-velka-sr.gif?direct&18}} | **Serbian** \\ [[https://ubs.ff.cuni.cz/cs/o-katedre/vyucujici/phdr-ana-adamovicova/|PhDr. Ana Adamovičová]]\\ [[http://ubs.ff.cuni.cz/en/|Institute of Czech Studies]] | | {{:cnk:vlajka-velka-sk.gif?direct&18}} | **Slovak** \\ doc. PhDr. Mira Nábělková CSc.\\ [[http://uves.ff.cuni.cz/en/|Department of East European Studies]] | | {{:cnk:vlajka-velka-sl.gif?direct&18}} | **Slovenian** \\ Mgr. Leoš Soustružník \\ Mgr. David Blažek, Ph.D.\\ [[http://www.slu.cas.cz/|Institute of Slavonic Studies, Czech Academy of Sciences]] | | {{:cnk:vlajka-velka-es.gif?direct&18}} | **Spanish** \\ [[http://urs.ff.cuni.cz/vyucujici/spanelstina/petr-cermak/|Doc. PhDr. Petr Čermák, Ph.D.]]\\ [[http://urs.ff.cuni.cz/en/|Department of Romance Studies]] | | {{:cnk:vlajka-velka-sv.gif?direct&18}} | **Swedish** \\ Lenka John\\ [[http://www.swedenabroad.com/cs-CZ/Embassies/Prague//|Embassy of Sweden]]\\ Mgr. Silvie Cinková, Ph.D. | | {{:cnk:vlajka-velka-uk.gif?direct&18}} | **Ukrainian** \\ Dr. Natalia Kotsyba | ===== Citing InterCorp ===== **Specific language combination**: Author 1, Author 2 & Author 3((You can find the list of authors for each language in KonText in general information about a corpus, which will show by clicking on the name of the corpus under the KonText logo.)) (2022): //InterCorp – English, German ((Fill in the languages you use.)), Release 15 of 11 November 2022//. Institute of the Czech National Corpus, Charles University, Prague. Available from: http://www.korpus.cz **Whole corpus**: Rosen, A., Vavřín, M. & Zasina, A. J. (2022): //InterCorp, Release 15 of 11 November 2022//. Institute of the Czech National Corpus, Charles University. Available from: http://www.korpus.cz Čermák, F. & Rosen, A. (2012): The case of InterCorp, a multilingual parallel corpus. //International Journal of Corpus Linguistics//, 17(3), 411–427. [[http://www.jbe-platform.com/content/journals/10.1075/ijcl.17.3.05cer|electronic version at IngentaConnect]], [[http://utkl.ff.cuni.cz/~rosen/public/2012_intercorp_ijcl.pdf|preprint version]] ===== See also ===== [[en:cnk:uvod|CNC corpora]] [[https://intercorp.korpus.cz/?lang=en|The original InterCorp site]]