Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Next revisionBoth sides next revision | ||
en:pojmy:din [2019/09/27 10:22] – [How it works] vaclavcvrcek | en:pojmy:din [2019/09/27 10:28] – vaclavcvrcek | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== DIN ====== | ====== DIN ====== | ||
- | DIN (Difference index) is a so called effect-size metric, i.e. measure designed((see Fidler, M. - Cvrček, V.: {{: | + | DIN (Difference index) is a so called effect-size metric, i.e. a measure designed((see Fidler, M. - Cvrček, V.: {{: |
===== Significance and relevance ===== | ===== Significance and relevance ===== | ||
- | When comparing values (e.g. frequencies of words) we should be interested | + | When comparing values (e.g. frequencies of words) we should be aware not only of the statistical significance but also whether the difference under consideration is actually |
+ | |||
+ | Even if the difference is significant it does not necessarily entails that it is also relevant for the description. Even a small difference can be significant when there is a lot of measurements available. That is why the statistical significance information is often combined with the effect-size estimation. | ||
- | Even if the difference is signifiacnt it does not necesarily entails that it is relevant for the description. Even a small difference can be significant when there is a lot of results available. That is why the statistical significance information is often combined with the effect-size. | ||
===== How it works ===== | ===== How it works ===== | ||
- | In the model example of extracting prominent words (keywords) from a text we proceed in the following way. For units which display a statistically significant difference, the **DIN** value is subsequently calculated | + | In the model example of extracting prominent words (keywords) from a text we proceed in the following way. For units which display a statistically significant difference, the **DIN** value is subsequently calculated: |
$$DIN = 100 \times \frac{RelFq(Ttxt) - RelFq(RefC)}{RelFq(Ttxt) + RelFq(RefC)}$$ | $$DIN = 100 \times \frac{RelFq(Ttxt) - RelFq(RefC)}{RelFq(Ttxt) + RelFq(RefC)}$$ | ||
Line 16: | Line 17: | ||
where $RelFq(Ttxt)$ is the relative frequency of the phenomenon in the analyzed text (target text) and $RelFq(RefC)$ is the relative frequency of the same phenomenon in the reference corpus. | where $RelFq(Ttxt)$ is the relative frequency of the phenomenon in the analyzed text (target text) and $RelFq(RefC)$ is the relative frequency of the same phenomenon in the reference corpus. | ||
- | The formula takes into account the difference between relative frequencies (numerator) in relation to the frequency level of the items under comparison. This reference frequency level can be represented by their average value, as can be seen in this formula which is equivalent with the above (the coefficient changed from 100 to 50): | + | The formula takes into account the difference between relative frequencies (numerator) in relation to the frequency level of the items under comparison |
{{: | {{: |