Corpora Aranea

Aranea is a family of comparable web corpora prepared by Vladimír Benko. See more here.

Corpora available by now (December 2014)

Citing Aranea

Benko, V.: Aranea - comparable web corpora. Ústav Českého národního korpusu FF UK, Praha 2015. Available on-line: <http://www.korpus.cz>.

Benko, V. (2014): Aranea: Yet Another Family of (Comparable) Web Corpora. In: Sojka, P. – Horák, A. – Kopeček, I. – Pala, K. (eds): TSD 2014, LNAI 8655, 257–264. Springer International Publishing. (PDF to download)

Benko, V. (2024): The Aranea Corpora Family: Ten+ Years of Processing Web-Crawled Data. In: Nöth, E. – Horák, A. – Sojka, P. (eds): Text, Speech, and Dialogue. TSD 2024. Lecture Notes in Computer Science, vol 15048. Springer, Cham. https://doi.org/10.1007/978-3-031-70563-2_5