AplikaceAplikace
Nastavení

Rozdíly

Zde můžete vidět rozdíly mezi vybranou verzí a aktuální verzí dané stránky.

Odkaz na výstup diff

Obě strany předchozí revizePředchozí verze
Následující verze
Předchozí verze
cnk:aikoditex [2025/08/22 14:43] – [AI-Koditex] jirimilickacnk:aikoditex [2025/10/13 14:08] (aktuální) – [How to cite AI-Koditex] jirimilicka
Řádek 5: Řádek 5:
  
 <WRAP right 35%> <WRAP right 35%>
-^ <fs medium>Name</fs> ^^ <fs medium>AI-Koditex</fs> ^+^ <fs medium>Name</fs> ^^ <fs medium>AI-Koditex v1</fs> ^
 ^ Positions ^ Number of positions (tokens) |  24 030 795 |   ^ Positions ^ Number of positions (tokens) |  24 030 795 |  
 ^ ::: ^ Number of positions (excl. punctuation) |  20 180 737 | ^ ::: ^ Number of positions (excl. punctuation) |  20 180 737 |
Řádek 25: Řádek 25:
   * Prompt portion: The first 500 words (including punctuation) served as generation prompts   * Prompt portion: The first 500 words (including punctuation) served as generation prompts
  
-  * Reference portion: The remaining text (approximately 1,500 words) provided human-authored comparison material+  * Reference portion: The remaining text (approximately 1,500 words) can provide human-authored comparison material
  
 This segmentation strategy ensured that models received sufficient context for generation while maintaining substantial reference text for comparative analysis. Also, the context of 500 words left sufficient space in the context window even for older models (davinci-002 has maximum context of 2049 tokens, while 500 English words takes about 670 tokens). This segmentation strategy ensured that models received sufficient context for generation while maintaining substantial reference text for comparative analysis. Also, the context of 500 words left sufficient space in the context window even for older models (davinci-002 has maximum context of 2049 tokens, while 500 English words takes about 670 tokens).
Řádek 55: Řádek 55:
  
 ==== How to cite AI-Koditex ==== ==== How to cite AI-Koditex ====
- 
 <WRAP round tip 70%> <WRAP round tip 70%>
-Milička, J. – Marklová, A. – Cvrček, V.: //AI-Koditex//. Department of Linguistics, Faculty of Arts, Charles University, Prague 2025. Available at WWW: www.korpus.cz+Milička, J. – Marklová, A. – Cvrček, V. (2025): //AI Brown and AI Koditex: LLM-Generated Corpora Comparable to Traditional Corpora of English and Czech Texts//. Arxiv preprint: [[https://arxiv.org/abs/2509.22996]] 
 + 
 + 
 +Milička, J. – Marklová, A. – Cvrček, V.: //AI-Koditex, version 1, 1. 7. 2025//. Department of Linguistics, Faculty of Arts, Charles University, Prague 2025. Available at WWW: www.korpus.cz
 </WRAP> </WRAP>