Differences
This shows you the differences between two versions of the page.
Next revisionBoth sides next revision | |||
en:pojmy:arf [2016/12/12 15:45] – created veronikapojarova | en:pojmy:arf [2016/12/12 15:51] – [Reduced frequency and ARF] veronikapojarova | ||
---|---|---|---|
Line 9: | Line 9: | ||
Its definition is as follows: We use the letter //f// to label the frequency of a given word in the corpus. We divide the positions in the entire corpus into //f// sections of equal size. If the total number of words in the corpus should be divisible by //f//, the sections would be the same size; in the opposite case they may differ in one position. A reduced frequency is then the number of sections in which the given word occurs at least once. | Its definition is as follows: We use the letter //f// to label the frequency of a given word in the corpus. We divide the positions in the entire corpus into //f// sections of equal size. If the total number of words in the corpus should be divisible by //f//, the sections would be the same size; in the opposite case they may differ in one position. A reduced frequency is then the number of sections in which the given word occurs at least once. | ||
- | První slovo z našeho příkladu bude mít redukovanou četnost buď 1, padnou-li všechny jeho výskyty do jednoho úseku, nebo 2, jestliže náhodou bude hranice mezi dvěma úseky uprostřed shluku výskytů. Druhé slovo bude mít redukovanou četnost mnohem vyšší. V krajním případě může být teoreticky redukovaná četnost stejná jako četnost, a to právě tehdy, když každý výskyt daného slova padne do jednoho úseku. Prakticky se toto většinou nestává, alespoň ne pro slova s vyšší četností. | + | The first word from our example will have a reduced frequency of either |
The average reduced frequency (ARF) is then derived from the reduced frequency in the sense that it takes into account all possible compilations of the corpus (the order of the texts in it). It is calculated as an average of the reduced frequency from all possible compilations of the corpus. | The average reduced frequency (ARF) is then derived from the reduced frequency in the sense that it takes into account all possible compilations of the corpus (the order of the texts in it). It is calculated as an average of the reduced frequency from all possible compilations of the corpus. |