Skrýt
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

en:cnk:frwac [2015/10/24 11:53] (current)
Václav Horký created
Line 1: Line 1:
 +~~NOTOC~~
 +====== Corpus frWaC ======
 +
 +**frWaC** is a 1.6 billion word corpus constructed from the Web limiting the crawl to the **.fr** domain and using medium-frequency words from the Le Monde Diplomatique corpus and basic French vocabulary lists as seeds. The corpus was POS-tagged and lemmatized with the [[http://​www.ims.uni-stuttgart.de/​projekte/​corplex/​TreeTagger/​|TreeTagger]],​ more information available [[http://​wacky.sslmit.unibo.it/​lib/​exe/​fetch.php?​media=papers:​wacky_2008.pdf|here]].((Copied from: http://​wacky.sslmit.unibo.it/​doku.php?​id=corpora#​french.))
 +
 +
 +===== Citing frWaC =====
 +
 +<WRAP round tip 49%>
 +A. Ferraresi, S. Bernardini, G. Picci and M. Baroni (2010) “Web Corpora for Bilingual Lexicography:​ A Pilot Study of English/​French Collocation Extraction and Translation”. In Xiao, R. (ed.) Using Corpora in Contrastive and Translation Studies. Newcastle: Cambridge Scholars Publishing. ([[http://​wacky.sslmit.unibo.it/​lib/​exe/​fetch.php?​media=ferraresi_et_al_2010.pdf|PDF to download]])
 +</​WRAP>​
 +
 +
 +====== See also ======
 +<WRAP round box 49%>
 +[[en:​cnk:​dewac|deWaC]] • [[en:​cnk:​itwac|itWaC]] • [[en:​cnk:​ukwac|ukWaC]]
 +</​WRAP>​
 +