Skrýt
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

en:cnk:lestrepublicain [2015/10/24 11:32]
Václav Horký created
en:cnk:lestrepublicain [2016/12/16 16:07] (current)
Michal Škrabal [Corpus lEstRepublicain]
Line 2: Line 2:
 ====== Corpus lEstRepublicain ====== ====== Corpus lEstRepublicain ======
  
-Corpus consists of 3 volumes (1999, 2002, 2003; not all of them complete) of French regional newspaper L'Est Républicain. ​It contains almost 120 million words and it was built from [[http://​www.cnrtl.fr/​corpus/​estrepublicain/​|CNRTL data]]. The corpus is lemmatised and POS-tagged by [[http://​www.ims.uni-stuttgart.de/​projekte/​corplex/​TreeTagger/​|TreeTagger]].+Corpus consists of 3 volumes (1999, 2002, 2003; not all of them complete) of French regional newspaper L'Est Républicain. ​After the deduplication it contains ​almost 73 million words in version 2 (v1 had almost 120 million wordsand it was built from [[http://​www.cnrtl.fr/​corpus/​estrepublicain/​|CNRTL data]]. The corpus is lemmatised and POS-tagged by [[http://​www.ims.uni-stuttgart.de/​projekte/​corplex/​TreeTagger/​|TreeTagger]].
  
 For technical reasons, corpus lEstRepublicain is not included in the standard corpus list for Bonito 1; it is only available via the web interface. For technical reasons, corpus lEstRepublicain is not included in the standard corpus list for Bonito 1; it is only available via the web interface.