This is an old revision of the document!


InterCorp is a large parallel synchronic corpus covering a number of languages. The corpus is compiled mostly by teachers and students of the Faculty of Arts, Charles University in Prague, and by other collaborators of the ICNC. It serves as a source of data for theoretical studies, lexicography, student research, (foreign) language learning, computer applications, translators and also for the general public.

All texts in InterCorp and all features of the search interface are available after free registration and login. The registration is identical for all public ICNC corpora. No special registration for InterCorp is required if you already have user login and password for the Czech part of InterCorp.

InterCorp is a part of the Czech National Corpus, a project funded by the Ministry of Education of the Czech Republic within the programme Large Research, Development and Innovation Infrastructures (LM2015044; 2016-2019). In 2012-2015 and 2005-2011 the project was supported from the same source (projects no. LM2011023 and 0021620823, respectively). The entire project is academic and non-commercial.


Starting with Release 6, InterCorp can be seen as referential: all its previous releases stay available in their originally published form. The volume of texts, the number of languages and the extent of annotation (lemmatization and tagging) may grow with each new release and the introduction of new tools.

For more details about the individual releases of InterCorp see the overview below:

Release Publication year Number of words in millions1) Number of foreign languages Tagged / lemmatized List of changes
Intercorp Release 9 2016 1 460,0 39 23 / 20 Release 9
Intercorp Release 8 2015 1 423,0 38 20 / 17 Release 8
Intercorp Release 7 2014 1 390,0 38 20 / 17 Release 7
Intercorp Release 6 2013 867,3 31 17 / 14 Release 6
Intercorp Release 5 2012 542,6 27 17 / 14 Release 5
Intercorp Release 4 2011 92,3 22 13 / 10 Release 4
Intercorp Release 3 2011 72,3 22 13 / 10 Release 3
Intercorp Release 2 2009 49,3 21 10 / 7 Release 2
Intercorp Release 1 2009 34,5 20 10 / 7 Release 1
Intercorp Release 0 2008 25,0 19 0 / 0 Release 0

The corpus consists of two parts: core and collections. The core of InterCorp consists mostly of fiction with manually checked alignments. Collections are texts acquired in multiple languages, processed and aligned automatically: concordances may include more misaligned segments. Moreover, collection do not always include all texts from the original source, such as texts without a Czech counterpart. Some texts from the Acquis Communautaire and Europarl corpora have been partially corrected or omitted – as a result, they may differ in form or size if compared with the original source. A similar selection was applied to the Open Subtitles database, where – as an additional reduction – only a single translation was selected per title and language. On the other hand, some metadata items missing in the original resource but detectable from context or other sources have been added.

Each text has a Czech counterpart. As a result, Czech is the pivot language: for every text there is a single Czech version (original or translation), aligned with one or more foreign-language versions.

InterCorp can be accessed via a standard web browser from the integrated search interface of the Czech National Corpus KonText (previously also via NoSketch Engine and Park). There is a Czech tutorial on Kontext.

Specifying a parallel query
Result of a query for substrings lieb and lov


Project coordination, technical support and web pages administration: martin.vavrin(at mark)

Project administration: alexandr.rosen(at mark), lucie.novakova(at mark)

Discussion group: intercorp(at mark) - group address, please use only in justified cases


Project administration

Software and technical support

Coordinators for specific languages

Doc. PhDr. Petr Zemánek CSc.
Ústav srovnávací jazykovědy
Mgr. Jiří Milička
Ústav srovnávací jazykovědy
PhDr. Veranika Bialkovich
Prof. PhDr. Hana Gladkova, CSc.
Ústav slavistických a východoevropských studií
Mgr. Natalie Kalajdžievová Ph.D.
Katedra jihoslovanských a balkanistických studií
Mgr. Andreu Bauçà i Sastre, PhD.
Lektorát katalánského jazyka, Ústav románských studií
Mgr. Joan Ramon Marina Amat
Ústav vysokoškolského vzdělávání a výzkumu, Ministerstvo školství a mládeže, Andora
Mgr. Karel Jirásek, Ph.D.
Katedra jihoslovanských a balkanistických studií
Mgr. Jana Pavlisová
Mgr. Kateřina Haušildová
Ústav germánských studií
Mgr. Eliška Boková
PhDr. Zdenka Hrnčířová
Ústav germánských studií
Prof. PhDr. Aleš Klégr
Ústavu anglického jazyka a didaktiky
PhDr. Markéta Malá, Ph.D.
Ústavu anglického jazyka a didaktiky
PhDr. Pavlína Šaldová, Ph.D.
Ústavu anglického jazyka a didaktiky
Mgr. Leona Rohrauer
Ústavu anglického jazyka a didaktiky
Mgr. Michal Kubánek
Katedra anglistiky a amerikanistiky UP
Mgr. Lenka Fárová, Ph.D.
Ústav lingvistiky a ugrofinistiky
PhDr. Olga Nádvorníková Ph.D.
Ústav románských studií
PhDr. Vít Dovalil, Ph.D.
Ústav germánských studií
Mgr. Štěpán Zbytovský, Ph.D.
Ústav germánských studií
Mgr. Tomáš Káňa, Ph.D.
Katedra německého jazyka a literatury PeF MU v Brně
PhDr. Hana Peloušková, Ph.D.
Katedra německého jazyka a literatury PeF MU v Brně
Bc. Vojtěch Diatka
Ústav obecné lingvistiky
Mgr. Simona Kolmanová, Ph.D.
Katedra středoevropských studií
doc. Pavel Štichauer, Ph.D.
Ústav románských studií
Mgr. Michal Škrabal, Ph.D.
Ústav slavistických a východoevropských studií
RNDr. Hana Skoumalová, Ph.D.
Ústav teoretické a komputační lingvistiky
PhDr. Michala Adamová
Ústav Českého národního korpusu
Mgr. Vojkan Milenkovik
Ústav slavistických a východoevropských studií
Mgr. Pavel Vondřička Ph.D.
Ústav Českého národního korpusu
Mgr. Łucja Bańczyk
Dr. Renata Dybalska
Ústav slavistických a východoevropských studií
PhDr. Jaroslava Jindrová Ph.D.
Ústav románských studií
Ruben Pellar, Master of Arts, Ph.D.
Ing. Alexandr Krestovský
Univerzita Karlova v Praze CERGE
PhDr. Natálie Rajnochová, Ph.D.
Ústav slavistických a východoevropských studií
Mgr. Naděžda Runštuková
PhDr. Ana Adamovičová
Ústav bohemistických studií
doc. PhDr. Mira Nábělková CSc.
Ústav slavistických a východoevropských studií
Mgr. Leoš Soustružník
Mgr. David Blažek, Ph.D.
Slovanský ústav AV ČR
Doc. PhDr. Petr Čermák, Ph.D.
Ústav románských studií
Mgr. Silvie Cinková, Ph.D.
Ústav formální a aplikované lingvistiky MFF UK
Dr. Natalia Kotsyba

Citing InterCorp

Specific language combination: Author, 1., Author, 2., Author, 3.2): InterCorp – English, German 3), Release 8 of 4 June 2015. Institute of the Czech National Corpus, Charles University, Prague 2015. Available from:

Whole corpus: Rosen, A., Vavřín, M.: InterCorp, Release 8 of 4 June 2015. Institute of the Czech National Corpus, Charles University, Prague 2015. Available from:

Čermák, F. – Rosen, A. (2012): The case of InterCorp, a multilingual parallel corpus. International Journal of Corpus Linguistics, 17(3), 411–427. bibtex, electronic version at IngentaConnect, preprint version

See also

Total number of words in foreign texts
You can find the list of authors for each language in KonText in general information about a corpus, which will show by clicking on the name of the corpus under the KonText logo.
Fill in the languages you use.