AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:aibrown [2025/09/24 02:06] jirimilickaen:cnk:aibrown [2025/10/13 14:10] (current) – [How to cite AI-Brown] jirimilicka
Line 6: Line 6:
  
 <WRAP right 40%> <WRAP right 40%>
-^ <fs medium>Name</fs> ^^ <fs medium>AI-Brown</fs> ^+^ <fs medium>Name</fs> ^^ <fs medium>AI-Brown v1</fs> ^
 ^ Positions ^ Number of positions (tokens) |  27 661 454 |   ^ Positions ^ Number of positions (tokens) |  27 661 454 |  
 ^ ::: ^ Number of positions (excl. punctuation) |  23 975 982 | ^ ::: ^ Number of positions (excl. punctuation) |  23 975 982 |
Line 22: Line 22:
  
  
-The original reference BE21 Corpus was available in vertical format via the Czech National Corpus infrastructure. The preprocessing pipeline included several steps to prepare the data for prompt-based generation. Clean texts and metadata were extracted from the verticals, and structural tags were aligned with the Czech corpus format to ensure cross-linguistic consistency.+The preprocessing pipeline for the original reference BE21 corpus included several steps to prepare the data for prompt-based generation. Clean texts and metadata were extracted from the verticals, and structural tags were aligned with the Czech corpus format to ensure cross-linguistic consistency.
  
 Each BE21 text sample was split into two parts to support controlled generation: Each BE21 text sample was split into two parts to support controlled generation:
Line 53: Line 53:
  
  
-==== How to cite AI-Koditex ====+==== How to cite AI-Brown ====
  
 <WRAP round tip 70%> <WRAP round tip 70%>
-Milička, J. – Marklová, A. – Cvrček, V.: //AI-Brown//. Department of Linguistics, Faculty of Arts, Charles University, Prague 2025. Available at WWW: www.korpus.cz+Milička, J. – Marklová, A. – Cvrček, V. (2025): //AI Brown and AI Koditex: LLM-Generated Corpora Comparable to Traditional Corpora of English and Czech Texts//. Arxiv preprint: [[https://arxiv.org/abs/2509.22996]] 
 + 
 +Milička, J. – Marklová, A. – Cvrček, V.: //AI-Brown, version 1, 1. 7. 2025//. Department of Linguistics, Faculty of Arts, Charles University, Prague 2025. Available at WWW: www.korpus.cz
 </WRAP> </WRAP>