AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
en:cnk:intercorp:historie [2020/10/25 20:29] – [Release 11] alexandrrosenen:cnk:intercorp:historie [2022/11/23 14:31] – [Release 14] alexandrrosen
Line 1: Line 1:
  
 ====== InterCorp: Version history ====== ====== InterCorp: Version history ======
 +
 +===== Release 15 =====
 +
 +Published 11 November 2022
 +
 +== Data: ==
 +
 +  * Total number of word forms in foreign language texts: 1 588 mil., including 362 mil. core and 1 226 mil. collections
 +  * Total number of word forms in Czech texts: 210 mil., including 120 mil. core and 90 mil. collections
 +  * The Project Syndicate collection was extended by texts published in 2019–2021; Arabic and Chinese texts were included for the first time
 +  * Instead of a national tagger for Norwegian, the UDPipe tagger is used starting this release, including tokenization and tagset according to the Universal Dependencies standard (same as for Belarusian and Ukrainian)  
 +  * [[en:cnk:intercorp:verze15|Information about the corpus]]
 +
 +
 +===== Release 14 =====
 +
 +Published 31 January 2022
 +
 +== Data: ==
 +
 +  * Total number of word forms in foreign language texts: 1 572 mil., including 349 mil. core and 1 223 mil. collections
 +  * Total number of word forms in Czech texts: 207 mil., including 118 mil. core and 90 mil. collections
 +  * Upper Sorbian (abbreviated as hs) was added as a new language.
 +  * [[en:cnk:intercorp:verze14|Information about the corpus]]
 +
 +===== Release 13ud =====
 +
 +Published 22 December 2021
 +
 +[[https://wiki.korpus.cz/doku.php/en:cnk:intercorp:verze13ud#main_differences_between_releases_13_and_13ud | Differences between releases 13 and 13ud]]
 +
  
 ===== Release 13 ===== ===== Release 13 =====
Line 11: Line 42:
   * Total number of word forms in Czech texts: 203 mil., including 113 mil. core and 90 mil. collections   * Total number of word forms in Czech texts: 203 mil., including 113 mil. core and 90 mil. collections
   * Chinese is now represented also in the Core part   * Chinese is now represented also in the Core part
 +  * The ReLDI tagger is now used also for tagging Slovene
   * [[en:cnk:intercorp:verze13|Information about the corpus]]   * [[en:cnk:intercorp:verze13|Information about the corpus]]
  
Line 49: Line 81:
  
   * Total number of word forms in foreign language texts: 1,483 mil., including 258 mil. core and 1,225 mil. collections   * Total number of word forms in foreign language texts: 1,483 mil., including 258 mil. core and 1,225 mil. collections
-  * Total number of tokens in Czech texts: 192 mil., including 102 mil. core and 89 mil. collections+  * Total number of word forms in Czech texts: 192 mil., including 102 mil. core and 89 mil. collections
   * A new collection: translations of the Bible (Old and New Testament) in 18 languages   * A new collection: translations of the Bible (Old and New Testament) in 18 languages
   * Update of the //Project Syndicate// collection by new texts published in the previous two years   * Update of the //Project Syndicate// collection by new texts published in the previous two years
Line 258: Line 290:
   * first stable version   * first stable version
  
-Last update: //8 June 2015//+Last update: //14 January 2022//