Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
en:pojmy:din [2019/09/27 10:07] – [DIN] vaclavcvrcek | en:pojmy:din [2019/09/27 10:28] – vaclavcvrcek | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== DIN ====== | ====== DIN ====== | ||
- | DIN (Difference index) is a so called effect-size metric, i.e. measure designed((see Fidler, M. - Cvrček, V.: {{: | + | DIN (Difference index) is a so called effect-size metric, i.e. a measure designed((see Fidler, M. - Cvrček, V.: {{: |
+ | |||
+ | ===== Significance and relevance ===== | ||
+ | |||
+ | When comparing values (e.g. frequencies of words) we should be aware not only of the statistical significance but also whether the difference under consideration is actually relevant for the description. Statistical significance can be obtained by several tests (e.g. chi2 test, Fisher' | ||
+ | |||
+ | Even if the difference is significant it does not necessarily entails that it is also relevant for the description. Even a small difference can be significant when there is a lot of measurements available. That is why the statistical significance information is often combined with the effect-size estimation. | ||
===== How it works ===== | ===== How it works ===== | ||
- | The text inserted by the user is first [[en: | + | In the model example |
$$DIN = 100 \times \frac{RelFq(Ttxt) - RelFq(RefC)}{RelFq(Ttxt) + RelFq(RefC)}$$ | $$DIN = 100 \times \frac{RelFq(Ttxt) - RelFq(RefC)}{RelFq(Ttxt) + RelFq(RefC)}$$ | ||
- | where $RelFq(Ttxt)$ is the relative frequency of the phenomenon in the analyzed text (target text) and $RelFq(RefC)$ is the relative frequency of the same phenomenon in the reference corpus. The DIN values, which determine | + | where $RelFq(Ttxt)$ is the relative frequency of the phenomenon in the analyzed text (target text) and $RelFq(RefC)$ is the relative frequency of the same phenomenon in the reference corpus. |
+ | |||
+ | The formula takes into account the difference between relative frequencies (numerator) in relation to the frequency level of the items under comparison (denominator). This reference frequency level can be represented by their average value, as can be seen in following formula | ||
+ | |||
+ | {{: | ||
+ | |||
+ | ===== DIN Values ===== | ||
+ | |||
+ | The DIN values are designed to reach values from -100 to 100, it being understood that: | ||
* a value of -100 means that the given phenomenon does not occur in the analyzed text and is only in the reference corpus (therefore the word is not prominent in the analyzed text) | * a value of -100 means that the given phenomenon does not occur in the analyzed text and is only in the reference corpus (therefore the word is not prominent in the analyzed text) | ||
* a value of 0 means that the given phenomenon has approximately the same relative frequency in the analyzed text and in the reference corpus (therefore the word is not prominent in the analyzed text) | * a value of 0 means that the given phenomenon has approximately the same relative frequency in the analyzed text and in the reference corpus (therefore the word is not prominent in the analyzed text) |