AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:manualy:kontext:novy_dotaz [2022/04/20 17:52] – [Menu: Query] Michal Škrabalen:manualy:kontext:novy_dotaz [2024/02/12 15:58] (current) Jan Kocek
Line 42: Line 42:
  
 Once the query has been entered, the search can be started either by clicking on the **Search** button, or by pressing the Enter key (provided the focus is on the input line). Once the query has been entered, the search can be started either by clicking on the **Search** button, or by pressing the Enter key (provided the focus is on the input line).
 +
 +=== Query evaluation ===
 +
 +If the search is successful, the concordance list page is displayed. Its use is described in detail on the [[en:manualy:kontext:konkordance|Concordance]] page.
  
 ==== Search suggestions ==== ==== Search suggestions ====
Line 67: Line 71:
 ==== Restrict search ==== ==== Restrict search ====
  
-If we need to search only in a narrowly defined group of texts in the entire corpus, we have two options. Either we create our own virtual [[en:manualy:kontext:subcorpus]], which we will then be able to select within the offered corpora, or we can restrict the query with a number of conditions (typically with the command [[en:pojmy:within|within]]). As a rule, we choose the first option in situations where we know that we will be needing the subcorpus for a longer duration of timeor when its specification is complex. We use the second option when conducting ad hoc searches within some clearly defined text categories, which are specified with the help of the basic [[en:pojmy:atributy_strukturni|structural attributes]].+We have two options if we need to search only in a narrowly defined group of texts in the entire corpus. Either we create our own virtual [[en:manualy:kontext:subcorpus]], which we will then be able to select within the offered corpora, or we can restrict the query with several conditions (typically with the command [[en:pojmy:within|within]]). As a rule, we choose the first option when we know that we will need the subcorpus for a longer time or when its specification is complex. We use the second option when conducting ad hoc searches within some clearly defined text categories specified with the help of the basic [[en:pojmy:atributy_strukturni|structural attributes]].
  
-The query form allows for simplification by way of an additional //Restrict search// tab which is located underneath the context search and is activated with a click, similar to the (above mentioned) context specification.+The query form allows for simplification by way of an additional //Restrict search// tab located underneath the context search and activated with a click, similar to the (above-mentioned) context specification.
  
-[{{:en:manualy:kontext:hledani_subkorpus.png?direct&300|Form for searching in a subcorpus created ad hoc}}]+[{{:en:manualy:kontext:hledani_subkorpus_en.png?direct&300|Form for searching in a subcorpus created ad hoc }}] 
  
  
 Within this form it is possible to mark off the values of selected structural attributes that interest us. The form does not contain all structural attributes, but only those most often used in the given corpus (e.g. when searching in the [[en:cnk:syn2020|SYN2020]] it is [[en:pojmy:txtype_group|txtype_group]], [[en:pojmy:txtype|txtype]], [[en:pojmy:genre|genre]], [[en:pojmy:srclang|srclang]]). The abbreviations used can be found in the [[en:seznamy:index|lists]] section. Within this form it is possible to mark off the values of selected structural attributes that interest us. The form does not contain all structural attributes, but only those most often used in the given corpus (e.g. when searching in the [[en:cnk:syn2020|SYN2020]] it is [[en:pojmy:txtype_group|txtype_group]], [[en:pojmy:txtype|txtype]], [[en:pojmy:genre|genre]], [[en:pojmy:srclang|srclang]]). The abbreviations used can be found in the [[en:seznamy:index|lists]] section.
  
-In the final column we can find a list of the specific [[en:pojmy:opus|opuses]] or [[en:pojmy:doc|documents]] (based on the selected corpus), which correspond to the specified condition. If such a list would be too long, the given column contains only the number of items. If we select some categories from the menu, we can view an inventory of texts which meet the given conditions with the help of the button **refine selection** (bottom left). The column containing the list of texts is recalculated according to the currently marked criteria. We can continue in this way until we are satisfied with the demarcation of the data that we want to use for our search.+In the final column we can find a list of the specific [[en:pojmy:opus|opuses]] or [[en:pojmy:doc|documents]] (based on the selected corpus), which correspond to the specified condition. If such a list is too long, the given column contains only the number of items. If we select some categories from the menu, we can view an inventory of texts which meet the given conditions with the help of the button **Refine selection**. The column containing the list of texts is recalculated according to the currently marked criteria. We can continue this way until we are satisfied with the demarcation of the data we want to use for our search. It is possible to go back (option **Undo**) or cancel the entire selection (option **Reset selection**). The selection can also be saved for later use  (option **Save as a subcorpus draft**), creating a new virtual [[en:pojmy:subkorpus|subcorpus]]. Furthermore, a list of documents in the current selection can be easily retrieved (option **Save a list of documents**), which can be handy, e.g. if you want to find out which fiction books are in the parallel InterCorp corpus for a given language(s)
  
-For a more detailed specification it is necessary to either use the condition [[en:pojmy:within|within]] inside a [[en:kurz:pokrocile_dotazy#dotazovaci_jazyk|CQL]] query, or to create a new virtual [[en:pojmy:subkorpus|subcorpus]].+For a more detailed specificationit is necessary to use the condition [[en:pojmy:within|within]] inside a [[en:kurz:pokrocile_dotazy#dotazovaci_jazyk|CQL]] query.
  
 ===== Paradigmatic query ===== ===== Paradigmatic query =====
  
-[{{ :manualy:kontext:paradigmaticky_dotaz.png?direct&400| Paradigmatic query FIXME }}]+[{{ :en:manualy:kontext:paradigmaticky_dotaz_en.png?direct&400| Paradigmatic query }}]
  
 In addition to the syntagmatic query described above (where we search for the set of [[en:pojmy:token|tokens]] matching a query and display them as [[en:pojmy:kwic|KWIC]]s in the form of a [[en:pojmy:konkordance|concordance]]), we can also use a [[en:pojmy:paradigmaticky|paradigmatic query]]. This search actually combines several individual syntagmatic sub-queries and returns the intersection of their frequency distributions. The result of paradigmatic querying is thus the set of [[en:pojmy:typ|types]] that match //all// specified syntagmatic queries. In addition to the syntagmatic query described above (where we search for the set of [[en:pojmy:token|tokens]] matching a query and display them as [[en:pojmy:kwic|KWIC]]s in the form of a [[en:pojmy:konkordance|concordance]]), we can also use a [[en:pojmy:paradigmaticky|paradigmatic query]]. This search actually combines several individual syntagmatic sub-queries and returns the intersection of their frequency distributions. The result of paradigmatic querying is thus the set of [[en:pojmy:typ|types]] that match //all// specified syntagmatic queries.
  
-[{{ :manualy:kontext:paradigma_vysledek.png?direct&400| Results of paradigmatic query FIXME}}]+[{{ :en:manualy:kontext:paradigma_vysledek_en.png?direct&400| Results of paradigmatic query }}]
  
 In the query form, partial syntagmatic sub-queries should be entered in separate boxes (additional boxes can be added using the + button at the bottom or removed by clicking on the trashcan icon on the right). One can also specify parameters such as the default attribute, the minimum frequency of each syntagmatic query, and the position at which the frequency distribution will be applied to each of them. In the query form, partial syntagmatic sub-queries should be entered in separate boxes (additional boxes can be added using the + button at the bottom or removed by clicking on the trashcan icon on the right). One can also specify parameters such as the default attribute, the minimum frequency of each syntagmatic query, and the position at which the frequency distribution will be applied to each of them.
Line 127: Line 131:
 The basic output of any query is a [[en:pojmy:konkordance|concordance]], i.e. a list of all the occurrences ([[en:pojmy:token|tokens]]) matching the query, along with their text surroundings. The **Word list** function evaluates the query in such a way that the result is a list of various words ([[en:pojmy:typ|types]]), matching the query, together with their absolute [[en:pojmy:frekvence|frequency]], [[en:pojmy:arf|ARF]] or number of documents in which the wanted phenomenon occurs. In this respect, the Word list function is analogous to [[en:manualy:kontext:frekvencni_distribuce|frequency distribution]], however its advantage is its speed and low computational complexity, because the extra step involving the concordance is not needed with the Word list. The basic output of any query is a [[en:pojmy:konkordance|concordance]], i.e. a list of all the occurrences ([[en:pojmy:token|tokens]]) matching the query, along with their text surroundings. The **Word list** function evaluates the query in such a way that the result is a list of various words ([[en:pojmy:typ|types]]), matching the query, together with their absolute [[en:pojmy:frekvence|frequency]], [[en:pojmy:arf|ARF]] or number of documents in which the wanted phenomenon occurs. In this respect, the Word list function is analogous to [[en:manualy:kontext:frekvencni_distribuce|frequency distribution]], however its advantage is its speed and low computational complexity, because the extra step involving the concordance is not needed with the Word list.
  
-[{{ en:manualy:kontext:seznam_slov_slovesa.png?direct&300|Form for creating word lists FIXME}}]+[{{ en:manualy:kontext:seznam_slov_slovesa_en.png?direct&300|Form for creating word lists }}]
  
 Various search parameters can be set in the form: Various search parameters can be set in the form:
Line 137: Line 141:
   * whitelist -- a list of pre-selected words (in a separate file) which we want to see in the resulting list   * whitelist -- a list of pre-selected words (in a separate file) which we want to see in the resulting list
   * blacklist -- a list of pre-selected words (in a separate file) which we want to exclude from the resulting list   * blacklist -- a list of pre-selected words (in a separate file) which we want to exclude from the resulting list
- 
  
 Among the output option settings we can find a selection of either the absolute [[en:pojmy:frekvence|frequency]], [[en:pojmy:arf|ARF]] or a document count. Furthermore there is also the possibility to choose a specific output attribute (or attributes). These attributes **need not be** identical to the positional attribute selected in the top section of the form, on which all the above mentioned filters are applied. This enables us to create e.g. a frequency list of all verbs by selecting the attribute [[en:pojmy:tag|tag]] in the top section, applying the condition for a verb as in [[en:seznamy:tagy#pozice_1_-_slovni_druh|V.*]] and finally by "switching" the output type to [[en:pojmy:lemma|lemma]] – an example of such a query is shown in the picture. Among the output option settings we can find a selection of either the absolute [[en:pojmy:frekvence|frequency]], [[en:pojmy:arf|ARF]] or a document count. Furthermore there is also the possibility to choose a specific output attribute (or attributes). These attributes **need not be** identical to the positional attribute selected in the top section of the form, on which all the above mentioned filters are applied. This enables us to create e.g. a frequency list of all verbs by selecting the attribute [[en:pojmy:tag|tag]] in the top section, applying the condition for a verb as in [[en:seznamy:tagy#pozice_1_-_slovni_druh|V.*]] and finally by "switching" the output type to [[en:pojmy:lemma|lemma]] – an example of such a query is shown in the picture.
 +
 +===== Keyword analysis =====
 +
 +[{{ :en:manualy:kontext:analyza_k_slov_en.png?direct&400|List of keywords in the ORAL v1 corpus compared to the SYN2020 reference corpus}}]
 +
 +The KonText interface can generate an inventory of keywords, i.e., forms or lemmas appearing in the selected (sub)corpus significantly more often than in the reference (sub)corpus, reflecting common language usage. (The analysis of keywords in the users’ texts is enabled by [[en:manualy:kwords|the specialized KWords application]].)
 +
 +Besides the corpus in which we want to find the terms in question, we also have to specify a reference corpus (or a subcorpus, e.g. if we want to compare a corpus consisting mainly of journalistic texts, such as one of SYN corpora, with a subcorpus of fiction texts: SYN2020-BEL). Next, we specify by which positional attribute the terms should be searched, by which metric they should be sorted (Log-likelihood, Chi-square, or Difference index), and possibly we also specify the desired minimum or maximum frequency. Sought-after terms can further be filtered using a regular expression; the default .* expression will display all results (or, rather, the first 1000 occurrences).
 +
 +The resulting list of keywords in the form of a table is sorted according to the selected metric, with the remaining two also displayed, followed in the next columns by the absolute and relative frequency values in both corpora. The list of found keywords can be viewed in both corpora in the respective concordance through the positive filter (<fc #4682b4>p</fc> to the right of the absolute frequency value).
 +
 +
  
 ===== Recent queries ===== ===== Recent queries =====
  
-The item displays an overview of the most recent queries used (a simplified list of previous queries is also accessible directly from the query form, via a link above the input line). These queries can be filtered according to the query type or the currently used corpus, and only archived queries can be viewed as well. By clicking on the link //Edit and search//, we paste a previously specified constraints into the query form and we may either use it without any changes, or we may modify it further (e.g. change the corpus in which the query will be used, the query type, or we may specify the context). By clicking on the //Archive// option, we can name the query and permanently save it to the query history archive for later reuse.+The item displays an overview of the most recent queries used (a simplified list of previous queries is also accessible directly from the query form, via a link above the input line). These queries can be filtered according to the query type or the currently used corpus, and only archived queries can be viewed as well. By clicking on the link //Edit and search//, we paste a previously specified constraints into the query form and we may either use it without any changes, or we may modify it further (e.g. change the corpus in which the query will be used, the query type, or we may specify the context).  
 + 
 +By clicking on the gear and then on the **Archive** option, we can name the query and permanently save it to the query history archive for later reuse. The complete status of the query form is saved, e.g. also the selected text types.