Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
en:pojmy:frekvence [2016/12/15 14:46] Václav Cvrček [The use and significance of frequency] |
en:pojmy:frekvence [2016/12/15 14:47] (current) Václav Cvrček [Measured and expected frequency] |
||
---|---|---|---|
Line 34: | Line 34: | ||
* //N// is the size of the corpus in numbers of [[en:pojmy:token|tokens]] | * //N// is the size of the corpus in numbers of [[en:pojmy:token|tokens]] | ||
- | We will never know the exact probability of the phenomenon in a population of all manifestations, but it can be approximated by the relative frequency discovered in previous comparisons using different data (other corpora). In the [[en:cnk:syn2005|SYN2005]] corpus we can therefore determine the probability of the occurrence of the [[en:pojmy:lemma|lemma]] //škola// from its frequency (f = 47872) and from the total size of the corpus (N = 122419382): | + | We will never know the exact probability of the phenomenon in a population of all manifestations, but it can be approximated by the relative frequency discovered in previous comparisons using different data (other corpora). In the [[en:cnk:syn2005|SYN2005]] corpus we can therefore determine the probability of the occurrence of the [[en:pojmy:lemma|lemma]] //škola// ('school') from its frequency (f = 47872) and from the total size of the corpus (N = 122419382): |
$ p(\text{škola}) = \frac{f(\text{škola})}{N} = \frac{47872}{122419382} = 0,0003910492 = 3,91 \cdot 10^{-4} $ | $ p(\text{škola}) = \frac{f(\text{škola})}{N} = \frac{47872}{122419382} = 0,0003910492 = 3,91 \cdot 10^{-4} $ |