Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Next revisionBoth sides next revision |
en:pojmy:frekvence [2016/12/15 14:41] – [The use and significance of frequency] vaclavcvrcek | en:pojmy:frekvence [2016/12/15 14:46] – [The use and significance of frequency] vaclavcvrcek |
---|
===== The use and significance of frequency ===== | ===== The use and significance of frequency ===== |
| |
Frequency as a fundamental value of an arbitrary ([[en:pojmy:typ|type]]) and langue (system) characteristic is used not only for determining the relations between alternating phenomena (e.g. the frequency of morphological variants //bychom// and //bysme//, as in [[http://syd.korpus.cz/05xNuUX8.syn|SyD]]), but it also serves the compilation of dictionaries (defining the most frequent words as core vocabulary), the extraction of [[en:pojmy:kolokace|collocations]], the evaluation of grammatical categories, the identification of [[en:pojmy:keyword|keywords]] in texts etc. | Frequency as a fundamental value of an arbitrary unit ([[en:pojmy:typ|type]]) and langue (system) characteristic is used not only for determining the relations between alternating phenomena (e.g. the frequency of morphological variants //bychom// and //bysme//, as in [[http://syd.korpus.cz/05xNuUX8.syn|SyD]]), but it also serves the compilation of dictionaries (defining the most frequent words as core vocabulary), the extraction of [[en:pojmy:kolokace|collocations]], the evaluation of grammatical categories, the identification of [[en:pojmy:keyword|keywords]] in texts etc. |
| |
In order to interpret frequency correctly it is necessary to realize that it is a point estimate if the frequency of phenomena in the entire language. Every corpus is more or less a precise approximation of the population in question (=texts of a certain nature), and therefore in different corpora created using the same methodology (even if it were possible to guarantee their full comparability) the frequencies of the desired phenomenon will differ slightly. This variability can be captured using the **[[wp>Confidence_interval|confidence interval]]** which gives the span containing (with a certain probability) the frequency of a given phenomenon. For finding out the confidence interval we use a [[wp>Binomial_distribution|binomial distribution]], the input values being the frequency of the phenomenon, the size of the corpus and the significance level (expressing a tolerable error rate). | In order to interpret frequency correctly it is necessary to realize that it is a point estimate of the frequency of phenomena in the entire language. Every corpus is more or less a precise approximation of the population in question (=texts of a certain nature), and therefore in different corpora created using the same methodology (even if it were possible to guarantee their full comparability) the frequencies of the desired phenomenon will differ slightly. This variability can be captured using the **[[wp>Confidence_interval|confidence interval]]** which gives the span containing (with a certain probability) the frequency of a given phenomenon. For finding out the confidence interval we use a [[wp>Binomial_distribution|binomial distribution]], the input values being the frequency of the phenomenon, the size of the corpus and the significance level (expressing a tolerable error rate). |
| |
<html> | <html> |
</html> | </html> |
| |
The confidence interval around the measured frequency on the significance level of 0,95 says that in an experiment which would encompass an infinite number of comparable corpora of the same size, the frequency of the given phenomenon would within this interval in 95% of measurements. When conducting our analysis we should always be aware that the actual frequency of a phenomenon can acquire any value from the confidence interval. | The confidence interval around the measured frequency on the significance level of 0.95 says that in an experiment which would encompass an infinite number of comparable corpora of the same size, the frequency of the given phenomenon would be within this interval in 95% of measurements. When conducting our analysis we should always be aware that the actual frequency of a phenomenon can acquire any value from the confidence interval. |
| |
=== Examples === | === Examples === |