Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:pojmy:frekvence [2016/12/12 18:01] – [Naměřená a očekávaná frekvence] veronikapojarova | en:pojmy:frekvence [2020/08/10 16:40] (current) – [The use and significance of frequency] vaclavcvrcek | ||
---|---|---|---|
Line 34: | Line 34: | ||
* //N// is the size of the corpus in numbers of [[en: | * //N// is the size of the corpus in numbers of [[en: | ||
- | We will never know the exact probability of the phenomenon in a population of all manifestations, | + | We will never know the exact probability of the phenomenon in a population of all manifestations, |
$ p(\text{škola}) = \frac{f(\text{škola})}{N} = \frac{47872}{122419382} = 0, | $ p(\text{škola}) = \frac{f(\text{škola})}{N} = \frac{47872}{122419382} = 0, | ||
Line 47: | Line 47: | ||
The measured and expected values can then be compared, e.g. with the aid of the [[en: | The measured and expected values can then be compared, e.g. with the aid of the [[en: | ||
- | ===== Využití a význam frekvence | + | ===== The use and significance of frequency |
- | Frekvence jako základní veličina libovolné jednotky | + | Frequency as a fundamental characteristic of any unit ([[en:pojmy:typ|type]]) is used not only for determining the relations between alternating phenomena |
- | Pro korektní interpretaci frekvence je třeba si uvědomit, že se jedná o bodový odhad četnosti jevu v celém jazyce. Každý korpus je více či méně přesnou aproximací zkoumané populace | + | In order to interpret frequency correctly it is necessary to realize that it is a point estimate of the frequency of phenomena in the entire language. Every corpus is more or less a precise approximation of the population in question |
- | < | + | For finding out the confidence interval we use the corpus calculator **Calc** ([[https://www.korpus.cz/calc/?module=1|www.korpus.cz/calc]]) which calculates the interval using a [[wp>Binomial_distribution|binomial distribution]], |
- | <iframe id=" | + | |
- | <script> | + | |
- | (function() { | + | |
- | //////////////////////////////////////////// | + | |
- | // CONFIGURE THESE TO MATCH YOUR USE CASE // | + | |
- | //////////////////////////////////////////// | + | |
- | // this should be the root URL of the child frame (Shiny app) which you want | + | The confidence interval around the measured frequency on the significance level of 0.95 says that in an experiment |
- | // to allow to send messages to the parent | + | |
- | var allowedOrigin = " | + | |
- | + | ||
- | /////////////////////// | + | |
- | // END CONFIGURATION // | + | |
- | /////////////////////// | + | |
- | + | ||
- | var embeddedApp = document.getElementById(" | + | |
- | + | ||
- | function resizeIframe(pixels) { | + | |
- | embeddedApp.style.height = pixels + " | + | |
- | } | + | |
- | + | ||
- | // cross-browser compatible infrastructure | + | |
- | var eventMethod = window.addEventListener ? " | + | |
- | var eventer = window[eventMethod]; | + | |
- | var messageEvent = eventMethod == " | + | |
- | + | ||
- | // listen to message from iframe | + | |
- | eventer(messageEvent, function(e) { | + | |
- | if (e.origin === allowedOrigin) { | + | |
- | var key = e.message ? " | + | |
- | var data = e[key]; | + | |
- | resizeIframe(data); | + | |
- | } else { | + | |
- | console.log(" | + | |
- | } | + | |
- | }, false); | + | |
- | + | ||
- | // send message to iframe on window resize | + | |
- | window.onresize = function() { | + | |
- | embeddedApp.contentWindow.postMessage(" | + | |
- | }; | + | |
- | })(); | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | Konfidenční | + | |
=== Examples === | === Examples === | ||
Line 105: | Line 61: | ||
If we measure in a corpus of 100 mil. words (e.g. [[en: | If we measure in a corpus of 100 mil. words (e.g. [[en: | ||
- | If we discover that the given pheomenon | + | If we discover that the given phenomenon |
- | ===== Disperze jevů ===== | + | ===== Dispersion of phenomena |
- | V některých případech je třeba absolutní nebo relativní frekvenci doplnit ještě informací o disperzi (rozložení) daného jevu napříč textem/korpusem. I relativně velmi frekventované jevy se můžou totiž vyskytovat pouze v omezeném okruhu textů nebo v určité části dokumentu. V takových případech může být samotná frekvence jako ukazatel běžnosti prostředku údajem nespolehlivým. Za účelem kvantifikace nerovnoměrnosti rozložení slov v korpusech se užívají různé míry disperze, z nichž nejjednodušší jsou založeny na počítání počtu dokumentů, v nichž se jednotka vyskytuje, nebo autorů, kteří jí použili. Sofistikovanější způsoby zjišťování disperze prostředků využívají průměrných dílčích frekvencí v rámci jednotlivých úseků textu/korpusu, příp. počítání variačního koeficientu, | + | In some cases it is necessary to supplement absolute or relative frequency with information about the dispersion of the given phenomenon throughout the text/corpus. Even phenomena which are relatively very frequent can appear only in a limited circle of texts or in certain parts of the document. In such cases, the frequency itself can be an unreliable indicator of conventionality. In order to quantify the uneven distribution of words in corpora, various measures of dispersion are used, the most simple of which are based on counting the number of documents in which the unit appears, or authors who used it. More sophisticated ways of obtaining information about dispersion include using average partial frequencies within individual sections of the text/corpus, or calculating the variation coefficient i.e. the ratio of the standard deviation of frequencies in the individual sections to the average of these partial frequencies |
- | ==== Související odkazy | + | ==== Related links ==== |
<WRAP round box 49%> | <WRAP round box 49%> | ||
- | [[pojmy: | + | [[en:pojmy: |
</ | </ |