AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
en:manualy:kontext:frekvencni_distribuce [2016/11/08 16:55] – [Frequency list (summary)] veronikapojarovaen:manualy:kontext:frekvencni_distribuce [2018/01/09 11:08] – [Two-attribute interrelationship frequency distribution] michalskrabal
Line 1: Line 1:
 ====== Menu: Frequency ====== ====== Menu: Frequency ======
  
-In the [[en:manualy:kontext:index|KonText interface ]] menu the item //Frequency// includes the function for creating **frequency distribution**. With this function it is possible to get an overview of the [[en:pojmy:typ|types]] (e.g. of different words) in the search results, along with their frequency. If we wish to find all of the nouns in the genitive case and in the plural form, with this funtion we can determine which [[en:pojmy:word|words]] occur in this particular case and number and how frequently. It is also possible to use frequency distribution to determine the frequency of both the previous and the following units, calculate [[wp>Lemma_(psycholinguistics)|lemmas]] in the [[en:pojmy:konkordance|concordance]] or determine the distribution of the wanted phenomenon across different text types and their groups (according to the [[en:pojmy:genre|genre]], [[en:pojmy:txtype|txtype]] etc.).+In the [[en:manualy:kontext:index|KonText interface ]] menu the item //Frequency// includes the function for creating **frequency distribution**. With this function it is possible to get an overview of the [[en:pojmy:typ|types]] (e.g. of different words) in the search results, along with their frequency. If we wish to find all of the nouns in the genitive case and in the plural form, with this function we can determine which [[en:pojmy:word|words]] occur in this particular case and number and how frequently. It is also possible to use frequency distribution to determine the frequency of both the previous and the following units, calculate [[en:pojmy:lemma|lemmas]] in the [[en:pojmy:konkordance|concordance]] or determine the distribution of the wanted phenomenon across different text types and their groups (according to the [[en:pojmy:genre|genre]], [[en:pojmy:txtype|txtype]] etc.).
  
 Frequency distribution includes both custom (general) settings and **quick selection** (both are available at the second level of menu): Frequency distribution includes both custom (general) settings and **quick selection** (both are available at the second level of menu):
-  - **Lemmas** - asseses the query ([[en:pojmy:kwic|KWIC]]) and lists all of the different types of lemmas (attribute [[wp>Lemma_(psycholinguistics)|lemma]]), along with their frequency  ((This option is available only for the corpora that have been lemmatized))+  - **Lemmas** - assesses the query ([[en:pojmy:kwic|KWIC]]) and lists all of the different types of lemmas (attribute [[en:pojmy:lemma|lemma]]), along with their frequency  ((This option is available only for the corpora that have been lemmatized))
   - **Node forms** - assesses the query ([[en:pojmy:kwic|KWIC]]) and lists all of the different forms (attribute [[en:pojmy:word|word]]), along with their frequency    - **Node forms** - assesses the query ([[en:pojmy:kwic|KWIC]]) and lists all of the different forms (attribute [[en:pojmy:word|word]]), along with their frequency 
   - **Doc IDs** - assesses the whole [[en:pojmy:konkordance|concordance]] and lists the text names ([[en:pojmy:atributy_strukturni|structural attributes]] ''name'') in which the wanted phenomenon occurs, along with the frequency of this phenomenon in the individual texts    - **Doc IDs** - assesses the whole [[en:pojmy:konkordance|concordance]] and lists the text names ([[en:pojmy:atributy_strukturni|structural attributes]] ''name'') in which the wanted phenomenon occurs, along with the frequency of this phenomenon in the individual texts 
Line 16: Line 16:
  
  
-  - form for multilevel frequency distribution (which can be used to analyze [[en:pojmy:atributy_pozicni|positional attributes]]) such as word, lemma, tag etc.) +  - form for multilevel frequency distribution (which can be used to analyze [[en:pojmy:atributy_pozicni|positional attributes]]) such as word, lemma, tag etc.) 
- +
   - form for frequency distribution according to the [[en:pojmy:atributy_strukturni|structure attributes]] (such as ''[[en:pojmy:txtype|txtype]]'', ''[[en:pojmy:medium|med]]'' or ''[[en:pojmy:srclang|srclang]]'')   - form for frequency distribution according to the [[en:pojmy:atributy_strukturni|structure attributes]] (such as ''[[en:pojmy:txtype|txtype]]'', ''[[en:pojmy:medium|med]]'' or ''[[en:pojmy:srclang|srclang]]'')
 +  - form for frequency distribution reflecting the two-attribute interrelationship (both positional and structure attributes)
  
  
Line 27: Line 27:
 **Multilevel frequency distribution** enables us to  calculate frequency distribution of any concordance position within the span of 6 positions to the left and 6 to the right from [[en:pojmy:kwic|KWIC]]. At first it is necessary to select in the form which **attribute** we wish to calculate in frequency distribution (e.g. in [[en:cnk:syn| SYN ]] corpora there are available basic [[en:pojmy:atributy_pozicni|positional attributes]] ) ''word'', ''lemma'', ''tag'', ''lc'', ''pos'', along with specific attributes ''k'', ''g'', ''c'').   **Multilevel frequency distribution** enables us to  calculate frequency distribution of any concordance position within the span of 6 positions to the left and 6 to the right from [[en:pojmy:kwic|KWIC]]. At first it is necessary to select in the form which **attribute** we wish to calculate in frequency distribution (e.g. in [[en:cnk:syn| SYN ]] corpora there are available basic [[en:pojmy:atributy_pozicni|positional attributes]] ) ''word'', ''lemma'', ''tag'', ''lc'', ''pos'', along with specific attributes ''k'', ''g'', ''c'').  
  
-Afterwards, it is necessary to select whether frequency distribution should be calculated regerdless of the letter case. Selection of the option [[wp>Case_sensitivity|case-insensitive]] causes that all of the items are interpreted as having lower case, regardless of what type of case they actually have in the corpus.  +Afterwards, it is necessary to select whether frequency distribution should be calculated regardless of the letter case. Selection of the option [[wp>Case_sensitivity|case-insensitive]] causes that all of the items are interpreted as having lower case, regardless of what type of case they actually have in the corpus.  
  
-In case of custom settings of frequency distribution, we do not need to restrict ourselves to KWIC only (unlike when working with quick selection). It can be calculated from any context position to the right or left from the wanted word. The item //position// in the form enables us to select not only positions from the lext (the preceding) context (6L-1L), but also KWIC itself and positions to the right (the following) context (1R-6R). The numbering of the positions (according to both current and older notation) is summed up in the following table:+In case of custom settings of frequency distribution, we do not need to restrict ourselves to KWIC only (unlike when working with quick selection). It can be calculated from any context position to the right or left from the wanted word. The item //position// in the form enables us to select not only positions from the left (the preceding) context (6L-1L), but also KWIC itself and positions to the right (the following) context (1R-6R). The numbering of the positions (according to both current and older notation) is summed up in the following table:
  
 ^ concordance  | místnosti | . | Byly | z | těžkého | tmavého |  **<fc #FF0000>dřeva</fc>**  | a | zlověstně | zaskřípaly | . | Poslepu | jsem | ^ concordance  | místnosti | . | Byly | z | těžkého | tmavého |  **<fc #FF0000>dřeva</fc>**  | a | zlověstně | zaskřípaly | . | Poslepu | jsem |
Line 45: Line 45:
 [{{ :en:manualy:kontext:fqdist-reference.png?direct&300|Form for frequency distribution according to [[en:pojmy:atributy_strukturni|structural attributes]] }}] [{{ :en:manualy:kontext:fqdist-reference.png?direct&300|Form for frequency distribution according to [[en:pojmy:atributy_strukturni|structural attributes]] }}]
  
-Provided that we are satisfied with the specification, we may begin the calculation by clicking on the **Make frequency list** button. All of the items with at least one occurence  will appear in the basic settings. If we wish to narrow the list down, we may set **Frequency limit** to the value which satisfies the situation.+Provided that we are satisfied with the specification, we may begin the calculation by clicking on the **Make frequency list** button. All of the items with at least one occurrence  will appear in the basic settings. If we wish to narrow the list down, we may set **Frequency limit** to the value which satisfies the situation.
  
  
-==== Text Type frequency distribution====+==== Text Type frequency distribution ====
  
-The settings of **Text Type frequency distribution** is located in the second part of the form. It is used only in those cases when the subject of the research depends on what text types do the occurences in the concordance occur (if we are interested in [[en:pojmy:txtype|txtype]]), [[en:pojmy:srclang|srclang]], [[en:pojmy:medium|medium]] etc.).+The settings of **Text Type frequency distribution** is located in the second part of the form. It is used only in those cases when the subject of the research depends on what text types do the occurrences in the concordance occur (if we are interested in [[en:pojmy:txtype|txtype]]), [[en:pojmy:srclang|srclang]], [[en:pojmy:medium|medium]] etc.).
  
 In the displayed list we may use the mouse to highlight the metainformation whose values we wish to calculate in the frequency distribution. If we select more than one value (by clicking on the Ctrl button), the search will result in more than one list - unlike in the previous case, this is not a multilevel analysis (in which the data from various levels combine), but successive launch of a number of different kinds of frequency distribution which results in a number of frequency lists. In the displayed list we may use the mouse to highlight the metainformation whose values we wish to calculate in the frequency distribution. If we select more than one value (by clicking on the Ctrl button), the search will result in more than one list - unlike in the previous case, this is not a multilevel analysis (in which the data from various levels combine), but successive launch of a number of different kinds of frequency distribution which results in a number of frequency lists.
  
 Even in this form we may set the frequency limit, if we wish to restrict the number of results in the list. With the option **Include categories with no hits** it is also possible to display those attributes in the list which did not appear in the concordance. Lemma //dřevo// has not once appeared in the songs (txtype [[en:seznamy:txtype|SON]]). Provided that this option is ticked, txtype SON will appear in the frequency distribution even with a zero frequency. Even in this form we may set the frequency limit, if we wish to restrict the number of results in the list. With the option **Include categories with no hits** it is also possible to display those attributes in the list which did not appear in the concordance. Lemma //dřevo// has not once appeared in the songs (txtype [[en:seznamy:txtype|SON]]). Provided that this option is ticked, txtype SON will appear in the frequency distribution even with a zero frequency.
 +
 +==== Two-attribute interrelationship frequency distribution ====
 +
 +[{{ :manualy:kontext:2d-fqdist.png?nolink&450|Result of a 2D frequency distribution}}] 
 +
 +The last type of frequency distribution reflects the interrelationship of two selected attributes (positional as well as structural). As an example, we can look at which nominal adjectives (the so-called short forms, such as rád or schopen) are prominent in three basic text type groups. First, choose the **Two-attribute interrelationship** in the **Frequency** option in the menu (under **Custom**) and select two attributes: first, choose **lemma** (displayed as rows in the table of results), and second, choose **doc.txtype_group** (among Text types, displayed as columns in the table). You can also adjust the minimal value or percentile of [[en:pojmy:frekvence|absolute or relative frequency]].
 +
 +
 +After clicking on **Make frequency list**, a table of results is displayed summarizing the number of occurrences of the adjectives in three selected text type groups (fiction, non-fiction and journalistic texts), sorted by frequency. This default setting can be changed: you can re-sort the table by [[en:pojmy:ipm|ipm]], switch the orientation of rows and columns or opt for a list of attribute pairs. If you are an advanced user, you can also try to sort the rows based on three criteria (attribute value, the total of absolute/relative frequency in a row or in a column), set the confidence interval (CI) or temper with the color mapping (for further information, see the help question mark next to the **Color mapping** choice). If you choose the relative frequency display, you can also look at a graph with confidence intervals by clicking on the chart icon next to each variable.
 +
 +
 ===== Frequency list (summary) ===== ===== Frequency list (summary) =====
  
 [{{ :en:manualy:kontext:fqdist-word-drevo.png?direct&300|Frequency list of the words of lemma //dřevo// }}] [{{ :en:manualy:kontext:fqdist-word-drevo.png?direct&300|Frequency list of the words of lemma //dřevo// }}]
  
-The following examples show how to use frequency list when  working with the [[en:cnk:syn2010|SYN2010]] corpus to search for a query of [[wp>Lemma_(psycholinguistics)|lemma]] //dřevo//.+The following examples show how to use frequency list when  working with the [[en:cnk:syn2010|SYN2010]] corpus to search for a query of [[en:pojmy:lemma|lemma]] //dřevo//.
 (''[lemma=%%"%%dřevo%%"%%]'').  (''[lemma=%%"%%dřevo%%"%%]''). 
   - Frequency list of the words of lemma //dřevo// regardless of case and with a zero frequency limit.   - Frequency list of the words of lemma //dřevo// regardless of case and with a zero frequency limit.
   - Frequency distribution of the values of structural attributes ''txtype'' and ''txtype_group''  of lemma //dřevo// (including the values with zero frequency)   - Frequency distribution of the values of structural attributes ''txtype'' and ''txtype_group''  of lemma //dřevo// (including the values with zero frequency)
      
-Different kinds of information will appear by every [[en:pojmy:word|word]] (attribute) displayed in the frequency list of lemma //dřevo//. The basic informaton is located in the frequency column and displays absolute frequency of a given item in the searched concordance (if the concordance was altered in some way before the frequency list was submitted - e.g. with filters - the frequency list will be altered accordingly). In the list to the left from the word, there are located links **p/n** which can be used for a quick display of positive or negative [[en:manualy:kontext:filtr|filter]]. By clicking on the **p** in the line displaying frequency for the word //dřevem//, we filter out this form from the current concordance,in the same way when **n** is activated, all of the occurrence of the given form will be eliminated from the current concordance.+Different kinds of information will appear by every [[en:pojmy:word|word]] (attribute) displayed in the frequency list of lemma //dřevo//. The basic information is located in the frequency column and displays absolute frequency of a given item in the searched concordance (if the concordance was altered in some way before the frequency list was submitted - e.g. with filters - the frequency list will be altered accordingly). In the list to the left from the word, there are located links **p/n** which can be used for a quick display of positive or negative [[en:manualy:kontext:filtr|filter]]. By clicking on the **p** in the line displaying frequency for the word //dřevem//, we filter out this form from the current concordance,in the same way when **n** is activated, all of the occurrence of the given form will be eliminated from the current concordance.
  
 The last column of the frequency list contains a horizontal bar chart. It is used for completing the differences between absolute frequencies of the individual items (the length of the horizontal lines should correspond to the word frequency). The last column of the frequency list contains a horizontal bar chart. It is used for completing the differences between absolute frequencies of the individual items (the length of the horizontal lines should correspond to the word frequency).