Skrýt
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
en:pojmy:frekvence [2016/12/15 14:46]
Václav Cvrček [The use and significance of frequency]
en:pojmy:frekvence [2016/12/15 14:47] (current)
Václav Cvrček [Measured and expected frequency]
Line 34: Line 34:
   * //N// is the size of the corpus in numbers of [[en:​pojmy:​token|tokens]]   * //N// is the size of the corpus in numbers of [[en:​pojmy:​token|tokens]]
  
-We will never know the exact probability of the phenomenon in a population of all manifestations,​ but it can be approximated by the relative frequency discovered in previous comparisons using different data (other corpora). In the [[en:​cnk:​syn2005|SYN2005]] corpus we can therefore determine the probability of the occurrence of the [[en:​pojmy:​lemma|lemma]] //škola// from its frequency (f = 47872) and from the total size of the corpus (N = 122419382):+We will never know the exact probability of the phenomenon in a population of all manifestations,​ but it can be approximated by the relative frequency discovered in previous comparisons using different data (other corpora). In the [[en:​cnk:​syn2005|SYN2005]] corpus we can therefore determine the probability of the occurrence of the [[en:​pojmy:​lemma|lemma]] //​škola// ​('​school'​) ​from its frequency (f = 47872) and from the total size of the corpus (N = 122419382):
  
 $ p(\text{škola}) = \frac{f(\text{škola})}{N} = \frac{47872}{122419382} = 0,​0003910492 = 3,91 \cdot 10^{-4} $ $ p(\text{škola}) = \frac{f(\text{škola})}{N} = \frac{47872}{122419382} = 0,​0003910492 = 3,91 \cdot 10^{-4} $