Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:manualy:kontext:frekvence [2022/05/19 14:13] – jankocek | en:manualy:kontext:frekvence [2023/03/13 14:17] (current) – [Custom settings of frequency distribution] lukes |
---|
==== Table view ==== | ==== Table view ==== |
| |
[{{ :en::manualy:kontext:fqdist-word-drevo_tab_en.png?direct&350|Frequency list of the words of lemma //dřevo// (including representations of confidency intervals) }}] | [{{ :en::manualy:kontext:fqdist-word-drevo_tab_en.png?direct&400|Frequency list of the words of lemma //dřevo// (including representations of confidency intervals) }}] |
| |
Výchozím způsobem zobrazení je tabulka s uvedením absolutních a relativních frekvencí pro jednotlivé položky (včetně možnosti zobrazení konfidenčních intervalů). | The default display is a table showing the absolute and relative frequencies for each item (including the option to display confidence intervals). |
| |
Different kinds of information will appear by every [[en:pojmy:word|word]] (attribute) displayed in the frequency list of lemma //dřevo//. The basic information is located in the frequency column and displays absolute frequency of a given item in the searched concordance (if the concordance was altered in some way before the frequency list was submitted - e.g. with filters - the frequency list will be altered accordingly). V seznamu se zobrazí všechny položky s alespoň jednotkovým výskytem. Chceme-li výpis zúžit, můžeme nastavit **Minimální frekvenci** na hodnotu, která vyhovuje konkrétní situaci. | Different kinds of information will appear by every [[en:pojmy:word|word]] (attribute) displayed in the frequency list of lemma //dřevo//. The basic information is located in the frequency column and displays absolute frequency of a given item in the searched concordance (if the concordance was altered in some way before the frequency list was submitted -- e.g. with filters -- the frequency list will be altered accordingly). All items with at least one occurrence will be displayed in the list. If we want to narrow down the list, we can set **Minimum Frequency** to a value that suits the specific situation. |
| |
Vedle sloupce s absolutní frekvencí se objevuje i položka [[pojmy:ipm|i.p.m.]] Vyjadřuje **relativní četnost** zkoumaných jevů vzhledem k celkové velikosti korpusu. V našem případě se tvar //dřevem// objevuje v korpusu [[cnk:syn2020|SYN2020]] s absolutní četností 5 712, což představuje 46,89 výskytů na milion slov (i.p.m.). | The [[en:pojmy:ipm|i.p.m.]] column next to the absolute frequency column expresses the **relative frequency** of the studied phenomena relative to the total size of the corpus. In our case, the form //dřeva// appears in the corpus [[en:cnk:syn2020|SYN2020]] with an absolute frequency of 5,712, which represents 46.89 occurrences per million words (i.p.m.). |
| |
Pro hodnoty absolutní i relativní frekvence lze pomocí další volby zobrazit i hodnoty **[[pojmy:konfidencni_intervaly|konfidenčních intervalů]]**, tj. rozsahy, v nichž by se dané frekvence (s pravděpodobností na určené **konfidenční hladině**) vyskytovaly v jiných, podobně sestavených korpusech srovnatelné velikosti. Konfidenční hladina je nastavena na 95 % a je možné ji uživatelsky změnit pomocí volby na 99 % nebo 90 %. | For both absolute and relative frequency values, an additional option can be used to display the values of **confidence intervals**, i.e. the ranges within which the given frequencies (with probability at a specified **confidence level**) would occur in other, similarly constructed corpora of comparable size. The confidence level is set at 95% and can be changed to 99% or 90%. |
| |
In the list to the left from the word, there are located links **p/n** which can be used for a quick display of positive or negative [[en:manualy:kontext:filtr|filter]]. By clicking on the **p** in the line displaying frequency for the word //dřevem//, we filter out this form from the current concordance,in the same way when **n** is activated, all of the occurrence of the given form will be eliminated from the current concordance. | In the list to the left from the word, there are located links **p/n** which can be used for a quick display of positive or negative [[en:manualy:kontext:filtr|filter]]. By clicking on the **p** in the line displaying frequency for the word //dřeva//, we filter out this form from the current concordance, in the same way when **n** is activated, all of the occurrences of the given form will be eliminated from the current concordance. |
| |
After clicking on the heading of the column, the table will automatically be rearranged according to the selected column. This way it is possible to create a list that is arranged alphabetically (in addition to the usual list arranged according to the frequency). | After clicking on the heading of the column, the table will automatically be rearranged according to the selected column. This way, it is possible to create a list that is arranged alphabetically (in addition to the usual list arranged according to the frequency). |
| |
| The **Share the table** function (the link is placed in the row above the table) generates a permanent link to the table, which can be sent directly from the form window to the specified e-mail address or later mentioned in an article, study, etc. |
| |
==== Chart view ==== | ==== Chart view ==== |
| |
Grafické zobrazení umožňuje vizualizovat informace představené v předchozím oddílu (absolutní a relativní frekvence položek s jejich konfidenčními intervaly) do podoby dvou typů grafů: horizontálního **sloupcového grafu** a grafu typu "**word cloud**". | The graphical display allows you to visualize the information presented in the previous section (absolute and relative frequencies of items with their confidence intervals) in the form of two types of graphs: either a horizontal **bar chart** or a "**word cloud**" graph. |
| |
[{{:en::manualy:kontext:fqdist-word-drevo_en.png?direct&350|Visualization type: bar }}] | [{{:en::manualy:kontext:fqdist-word-drevo_en.png?direct&350|Visualization type: bar }}] |
\\ | \\ |
Ve výchozím nastavení se zobrazuje sloupcový graf s hodnotami relativních četností včetně konfidenčních intervalů na hladině 95 %. | By default, a bar chart with relative frequencies including 95% confidence intervals is displayed. |
| |
Po rozkliknutí voleb nad grafem pomocí **(+)** je možné vlastnosti grafu upravit. Místo hodnot relativní četnosti lze zobrazit četnosti absolutní, dále lze omezit počet položek v grafu, seřadit položky podle abecedy namísto frekvenčního třídění a také exportovat graf jako obrázek. | By clicking on the options above the graph using **(+)** you can modify the properties of the graph. You can display absolute frequencies instead of relative frequency values, limit the number of items in the graph, sort items alphabetically instead of frequency sorting, and export the graph as an image. |
| |
Nakonec je možné graf přepnout do podoby grafu typu "word cloud", který zobrazuje skupinu zkoumaných položek (v našem příkladů tvarů slov) ve velikostech poměrně odpovídající jejich četnostem. Pro tento typ grafu je v uživatelském nastavení relevantní pouze možnost exportu grafu a omezení počtu položek v grafu. | Finally, the graph can be switched to a "word cloud," which displays a group of examined items (in our example, word forms) in sizes corresponding relatively to their frequencies. For this type of graph, only the option to export the graph and limit the number of items in the graph are relevant in the user settings. |
| |
[{{:en::manualy:kontext:fqdist-word-cloud_en.png?direct&350|Visualization type: Word cloud }}] | [{{:en::manualy:kontext:fqdist-word-cloud_en.png?direct&350|Visualization type: Word cloud }}] |
===== Custom settings of frequency distribution ===== | ===== Custom settings of frequency distribution ===== |
| |
The form which appears after clicking on the option **Frequency distribution → Custom** consists of two sections: | The form which appears after clicking on the menu item **Frequency → Custom** offers four options: |
| |
- form for multilevel frequency distribution (which can be used to analyze [[en:pojmy:atributy_pozicni|positional attributes]]) such as word, lemma, sublemma, tag, verbtag, etc.) | - multilevel frequency distribution (which can be used to analyze [[en:pojmy:atributy_pozicni|positional attributes]]) such as word, lemma, sublemma, tag, verbtag, etc.) |
- form for frequency distribution according to the [[en:pojmy:atributy_strukturni|structure attributes]] (such as ''[[en:pojmy:txtype|txtype]]'', ''[[en:pojmy:medium|med]]'' or ''[[en:pojmy:srclang|srclang]]'') | - frequency distribution according to the [[en:pojmy:atributy_strukturni|structure attributes]] (such as ''[[en:pojmy:txtype|txtype]]'', ''[[en:pojmy:medium|med]]'' or ''[[en:pojmy:srclang|srclang]]'') |
- form for frequency distribution reflecting the two-attribute interrelationship (both positional and structure attributes) | - dispersion plot showing the distribution of the searched concordance across the entire corpus |
| - 2-dimensional frequency distribution reflecting the relationship between two attributes (both positional and structure attributes) |
| |
[{{ :en:manualy:kontext:fqdist-pozice_en.png?direct&300|Form for multilevel frequency distribution ([[en:pojmy:atributy_pozicni|positional attributes]]) }}] | [{{ :en:manualy:kontext:fqdist-pozice_en.png?direct&300|Form for multilevel frequency distribution ([[en:pojmy:atributy_pozicni|positional attributes]]) }}] |
Afterwards, it is necessary to select whether frequency distribution should be calculated regardless of the letter case. Selection of the option [[wp>Case_sensitivity|case-insensitive]] causes that all of the items are interpreted as having lower case, regardless of what type of case they actually have in the corpus. | Afterwards, it is necessary to select whether frequency distribution should be calculated regardless of the letter case. Selection of the option [[wp>Case_sensitivity|case-insensitive]] causes that all of the items are interpreted as having lower case, regardless of what type of case they actually have in the corpus. |
| |
[{{ :en:manualy:kontext:fqdist-reference_en.png?direct&300|Form for frequency distribution according to [[en:pojmy:atributy_strukturni|structural attributes]] FIXME}}] | [{{ :en:manualy:kontext:fqdist-reference_en.png?direct&300|Form for frequency distribution according to [[en:pojmy:atributy_strukturni|structural attributes]] }}] |
| |
In case of custom settings of frequency distribution, we do not need to restrict ourselves to KWIC only (unlike when working with quick selection). It can be calculated from any context position to the right or left from the wanted word. The item //position// in the form enables us to select not only positions from the left (the preceding) context (6L-1L), but also KWIC itself and positions to the right (the following) context (1R-6R). The numbering of the positions (according to both current and older notation) is summed up in the following table: | In case of custom settings of frequency distribution, we do not need to restrict ourselves to KWIC only (unlike when working with quick selection). It can be calculated from any context position to the right or left from the wanted word. The item //position// in the form enables us to select not only positions from the left (the preceding) context (6L-1L), but also KWIC itself and positions to the right (the following) context (1R-6R). The numbering of the positions (according to both current and older notation) is summed up in the following table: |
| |
Just like the items, the structural attributes can also be rearranged in the table according to any column. This is especially useful when we need to know the order according to the relative frequency which allows for comparison of the number of occurrences even in the corpora of different sizes. | Just like the items, the structural attributes can also be rearranged in the table according to any column. This is especially useful when we need to know the order according to the relative frequency which allows for comparison of the number of occurrences even in the corpora of different sizes. |
| |
| ==== Disperze ==== |
| |
| The [[pojmy:frekvence#disperze_jevu|Dispersion]] function allows you to graphically represent the distribution of a given searched phenomenon across the text/corpus. In the initial form you need to set the number of sections (maximum 1000) into which the corpus will be divided for the purpose of displaying the dispersion. The resulting graph then shows the number of occurrences of the searched phenomenon within each section on the y-axis. |
| |
| [{{en:manualy:kontext:disperze.png?direct&450|Dispersion of the lemma //dřevo// (division into 100 sections) in SYN2020}}] |
| |
| |
==== Two-attribute interrelationship frequency distribution ==== | ==== Two-attribute interrelationship frequency distribution ==== |
After clicking on **Make frequency list**, a table of results is displayed summarizing the number of occurrences of the adjectives in three selected text type groups (fiction, non-fiction and journalistic texts), sorted by frequency. This default setting can be changed: you can re-sort the table by [[en:pojmy:ipm|ipm]], switch the orientation of rows and columns or opt for a list of attribute pairs. If you are an advanced user, you can also try to sort the rows based on three criteria (attribute value, the total of absolute/relative frequency in a row or in a column), set the confidence interval (CI) or temper with the color mapping (for further information, see the help question mark next to the **Color mapping** choice). If you choose the relative frequency display, you can also look at a graph with confidence intervals by clicking on the chart icon next to each variable. | After clicking on **Make frequency list**, a table of results is displayed summarizing the number of occurrences of the adjectives in three selected text type groups (fiction, non-fiction and journalistic texts), sorted by frequency. This default setting can be changed: you can re-sort the table by [[en:pojmy:ipm|ipm]], switch the orientation of rows and columns or opt for a list of attribute pairs. If you are an advanced user, you can also try to sort the rows based on three criteria (attribute value, the total of absolute/relative frequency in a row or in a column), set the confidence interval (CI) or temper with the color mapping (for further information, see the help question mark next to the **Color mapping** choice). If you choose the relative frequency display, you can also look at a graph with confidence intervals by clicking on the chart icon next to each variable. |
| |
[{{:en::manualy:kontext:2d-fqdist_en.png?direct&350|Result of a 2D frequency distribution}}] | [{{:en::manualy:kontext:2d-fqdist_en.png?direct&400|Result of a 2D frequency distribution}}] |
\\ | \\ |
| |