AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:intercorp:verze8 [2016/06/03 22:05] – [Corpus size in the number of words] --> in thousands of words (rounded down as in Czech version) Václav Horkýen:cnk:intercorp:verze8 [2018/07/30 15:12] (current) – [Access to the texts] Václav Cvrček
Line 1: Line 1:
 ~~NOTOC~~ ~~NOTOC~~
-====== InterCorp ======+====== InterCorp Release 8 ======
  
  
Line 25: Line 25:
 After [[http://korpus.cz/english/prohlaseni-aj.php|registration]] the corpus can be searched using a web interface. The registration is valid for all ICNC corpora with public access. If you already have a user name and password for the Czech part of the Czech National Corpus, you do not need to register for the parallel corpus. After [[http://korpus.cz/english/prohlaseni-aj.php|registration]] the corpus can be searched using a web interface. The registration is valid for all ICNC corpora with public access. If you already have a user name and password for the Czech part of the Czech National Corpus, you do not need to register for the parallel corpus.
  
-InterCorp can be accessed via a standard web browser from the integrated search interface of the Czech National Corpus [[http://kontext.korpus.cz/|KonText]]. A tutorial in Czech is available [[kurz:uvod|here]].+InterCorp can be accessed via a standard web browser from the integrated search interface of the Czech National Corpus [[http://kontext.korpus.cz/|KonText]]. A tutorial is available [[kurz:uvod|in Czech]] and [[en:kurz:hledani_v_paralelnim_korpusu|a brief summary also in English]].
  
 After signing a non-profit licence agreement, texts from InterCorp can also be acquired as bilingual files including shuffled pairs of sentences. Please contact us at the address below if you are interested. After signing a non-profit licence agreement, texts from InterCorp can also be acquired as bilingual files including shuffled pairs of sentences. Please contact us at the address below if you are interested.
Line 34: Line 34:
 ===== References ===== ===== References =====
  
-We would appreciate a link to the project site www.korpus.cz/intercorp in results of your work based on InterCorp. You might also consider adding the following reference in your scientific publications: Čermák, F. and Rosen, A. (2012). The case of InterCorp, a multilingual parallel corpus. International Journal of Corpus Linguistics, 13(3):411–427 (bibtex((''@article{cermak:rosen:10, Author = {Franti{\v{s}}ek {\v{C}}erm{\'{a}}k and Alexandr Rosen}, Issn = {1384-6655}, Journal = {International Journal of Corpus Linguistics}, Number = {3}, Pages = {411--427}, Title = {The Case of {I}nter{C}orp, a multilingual parallel corpus}, Url = {http://utkl.ff.cuni.cz/~rosen/public/2012_intercorp_ijcl.pdf}, Volume = {13}, Year = {2012}}'')), [[http://dx.doi.org/10.1075/ijcl.17.3.05cer|electronic edition at ing entaConnect]], [[http://utkl.ff.cuni.cz/~rosen/public/2012_intercorp_ijcl.pdf|preprint version]]).+If you publish results based on InterCorp we would appreciate a link to the project site [[http://www.korpus.cz/intercorp|www.korpus.cz/intercorp]]. In your scientific publications please cite the following paper: 
  
-For more references see the [[https://biblio.korpus.cz/|repository of bibliographical items based on the CNC]]. All references to work using InterCorp is welcomeSee [[https://www.korpus.cz/biblio_appeal.php|here]] for details.+<WRAP round info 50%> 
 +Čermák, F., Rosen, A. (2012). The case of InterCorp, a multilingual parallel corpus. //International Journal of Corpus Linguistics//. Vol. 13, no. 3, p. 411–427 
 +([[http://ucnk.ff.cuni.cz/intercorp/?req=page:references_bibtex&lang=cs|bibtex]], [[http://dx.doi.org/10.1075/ijcl.17.3.05cer|electronic edition at ingentaConnect]], [[http://utkl.ff.cuni.cz/~rosen/public/2012_intercorp_ijcl.pdf|preprint version]])
  
 +For more references see the [[https://biblio.korpus.cz/|repository of bibliographical items based on the CNC]]. All references to work using InterCorp are welcome. See [[https://www.korpus.cz/biblio_appeal.php|here]] for details.
  
 +When citing a specific part of InterCorp please use the reference displayed in KonText in the corpus description, e.g. as:
 +
 +Rosen, A., Vavřín, M.: //Korpus InterCorp – English, German((Insert actually used languages.)), version 7 from 19 Dec 2014//. Institute of the Czech National Corpus, Charles University, Prague 2014. Available on-line: http://www.korpus.cz
 +
 +</WRAP>
 ===== Texts in the corpus ===== ===== Texts in the corpus =====
  
Line 63: Line 71:
  
 ^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Total ^ ^ Language ^^ Core ^ Syndicate ^ Presseurop ^ Acquis ^ Europarl ^ Subtitles ^ Total ^
-| ar | Arabic | 34 |  0 |  0 |  0 |  0 |  0 |  34 | + ar  | Arabic | 34 |  0 |  0 |  0 |  0 |  0 |  34 | 
-| be | Belarusian | 2 152 |  0 |  0 |  0 |  0 |  0 |  2 152 | + be  | Belarusian | 2 152 |  0 |  0 |  0 |  0 |  0 |  2 152 | 
-| bg | Bulgarian | 5 240 |  0 |  0 |  13 816 |  9 083 |  0 |  28 140 | + bg  | Bulgarian | 5 240 |  0 |  0 |  13 816 |  9 083 |  0 |  28 140 | 
-| ca | Catalan |  4 632 |  0 |  0 |  0 |  0 |  0 |  4 632 | + ca  | Catalan |  4 632 |  0 |  0 |  0 |  0 |  0 |  4 632 | 
-| da | Danish |  3 016 |  0 |  0 |  21 679 |  13 915 |  14 429 |  53 042 | + da  | Danish |  3 016 |  0 |  0 |  21 679 |  13 915 |  14 429 |  53 042 | 
-| de | German |  27 681 |  3 725 |  2 482 |  21 723 |  13 089 |  8 366 |  77 069 |+ de  | German |  27 681 |  3 725 |  2 482 |  21 723 |  13 089 |  8 366 |  77 069 |
 |  el  | Greek |  0 |  0 |  0 |  25 069 |  15 403 |  23 714 |  64 187 | |  el  | Greek |  0 |  0 |  0 |  25 069 |  15 403 |  23 714 |  64 187 |
 |  en  | English |  15 488 |  3 818 |  2 670 |  24 207 |  15 580 |  52 101 |  113 865 | |  en  | English |  15 488 |  3 818 |  2 670 |  24 207 |  15 580 |  52 101 |  113 865 |
Line 211: Line 219:
  
   * Fiction in many Slavic and some other languages from [[http://www.uva.nl/over-de-uva/organisatie/medewerkers/content/b/a/a.a.barentsen/a.a.barentsen.html#tab_3|ASPAC – Amsterdam Slavic Parallel Aligned Corpus]] – with special thanks to  Adrian Barentsen   * Fiction in many Slavic and some other languages from [[http://www.uva.nl/over-de-uva/organisatie/medewerkers/content/b/a/a.a.barentsen/a.a.barentsen.html#tab_3|ASPAC – Amsterdam Slavic Parallel Aligned Corpus]] – with special thanks to  Adrian Barentsen
-  * Political commentaries in a number of languages from the site [[http://www.project-syndicate.org/|Project Syndicate]]\\ {{:cnk:intercorp:projectsyndicate.png?direct&319}}+  * Political commentaries in a number of languages from the site [[http://www.project-syndicate.org/|Project Syndicate]]
   * Newspaper texts in a number of languages from the [[http://www.voxeurop.eu|Presseurop/VoxEurop]] server   * Newspaper texts in a number of languages from the [[http://www.voxeurop.eu|Presseurop/VoxEurop]] server
   * Legal texts in EU languages from the [[http://wt.jrc.it/lt/Acquis/|JRC-ACQUIS]] corpus   * Legal texts in EU languages from the [[http://wt.jrc.it/lt/Acquis/|JRC-ACQUIS]] corpus
Line 246: Line 254:
  
  
- 
- 
-===== Citing InterCorp ===== 
- 
-<WRAP round tip 70%> 
-Rosen, A. – Vavřín, M.: //Korpus InterCorp – English, German((Insert actually used languages.)), version 7 from 19 Dec 2014//. Ústav Českého národního korpusu FF UK, Praha 2014. Available on-line: http://www.korpus.cz 
- 
-Čermák, F. – Rosen, A. (2012): The case of InterCorp, a multilingual parallel corpus. //International Journal of Corpus Linguistics//, 17(3), 411–427. 
-</WRAP>