AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:manualy:kontext:novy_dotaz [2023/03/13 20:04] – [Query types] jankrivanen:manualy:kontext:novy_dotaz [2024/02/12 15:58] (current) jankocek
Line 71: Line 71:
 ==== Restrict search ==== ==== Restrict search ====
  
-If we need to search only in a narrowly defined group of texts in the entire corpus, we have two options. Either we create our own virtual [[en:manualy:kontext:subcorpus]], which we will then be able to select within the offered corpora, or we can restrict the query with a number of conditions (typically with the command [[en:pojmy:within|within]]). As a rule, we choose the first option in situations where we know that we will be needing the subcorpus for a longer duration of timeor when its specification is complex. We use the second option when conducting ad hoc searches within some clearly defined text categories, which are specified with the help of the basic [[en:pojmy:atributy_strukturni|structural attributes]].+We have two options if we need to search only in a narrowly defined group of texts in the entire corpus. Either we create our own virtual [[en:manualy:kontext:subcorpus]], which we will then be able to select within the offered corpora, or we can restrict the query with several conditions (typically with the command [[en:pojmy:within|within]]). As a rule, we choose the first option when we know that we will need the subcorpus for a longer time or when its specification is complex. We use the second option when conducting ad hoc searches within some clearly defined text categories specified with the help of the basic [[en:pojmy:atributy_strukturni|structural attributes]].
  
-The query form allows for simplification by way of an additional //Restrict search// tab which is located underneath the context search and is activated with a click, similar to the (above mentioned) context specification.+The query form allows for simplification by way of an additional //Restrict search// tab located underneath the context search and activated with a click, similar to the (above-mentioned) context specification.
  
-[{{:en:manualy:kontext:hledani_subkorpus_en.png?direct&300|Form for searching in a subcorpus created ad hoc }}]+[{{:en:manualy:kontext:hledani_subkorpus_en.png?direct&300|Form for searching in a subcorpus created ad hoc }}] 
  
  
 Within this form it is possible to mark off the values of selected structural attributes that interest us. The form does not contain all structural attributes, but only those most often used in the given corpus (e.g. when searching in the [[en:cnk:syn2020|SYN2020]] it is [[en:pojmy:txtype_group|txtype_group]], [[en:pojmy:txtype|txtype]], [[en:pojmy:genre|genre]], [[en:pojmy:srclang|srclang]]). The abbreviations used can be found in the [[en:seznamy:index|lists]] section. Within this form it is possible to mark off the values of selected structural attributes that interest us. The form does not contain all structural attributes, but only those most often used in the given corpus (e.g. when searching in the [[en:cnk:syn2020|SYN2020]] it is [[en:pojmy:txtype_group|txtype_group]], [[en:pojmy:txtype|txtype]], [[en:pojmy:genre|genre]], [[en:pojmy:srclang|srclang]]). The abbreviations used can be found in the [[en:seznamy:index|lists]] section.
  
-In the final column we can find a list of the specific [[en:pojmy:opus|opuses]] or [[en:pojmy:doc|documents]] (based on the selected corpus), which correspond to the specified condition. If such a list would be too long, the given column contains only the number of items. If we select some categories from the menu, we can view an inventory of texts which meet the given conditions with the help of the button **refine selection**. The column containing the list of texts is recalculated according to the currently marked criteria. We can continue in this way until we are satisfied with the demarcation of the data that we want to use for our search. It is possible to go back (option **Undo**) or cancel the entire selection (option **Reset selection**). You can also save the selection permanently (option Save as subcorpus), creating a new virtual [[en:pojmy:subkorpus|subcorpus]].+In the final column we can find a list of the specific [[en:pojmy:opus|opuses]] or [[en:pojmy:doc|documents]] (based on the selected corpus), which correspond to the specified condition. If such a list is too long, the given column contains only the number of items. If we select some categories from the menu, we can view an inventory of texts which meet the given conditions with the help of the button **Refine selection**. The column containing the list of texts is recalculated according to the currently marked criteria. We can continue this way until we are satisfied with the demarcation of the data we want to use for our search. It is possible to go back (option **Undo**) or cancel the entire selection (option **Reset selection**). The selection can also be saved for later use  (option **Save as subcorpus draft**), creating a new virtual [[en:pojmy:subkorpus|subcorpus]]. Furthermore, a list of documents in the current selection can be easily retrieved (option **Save a list of documents**), which can be handy, e.g. if you want to find out which fiction books are in the parallel InterCorp corpus for a given language(s)
  
-For a more detailed specification it is necessary to use the condition [[en:pojmy:within|within]] inside a [[en:kurz:pokrocile_dotazy#dotazovaci_jazyk|CQL]] query.+For a more detailed specificationit is necessary to use the condition [[en:pojmy:within|within]] inside a [[en:kurz:pokrocile_dotazy#dotazovaci_jazyk|CQL]] query.
  
 ===== Paradigmatic query ===== ===== Paradigmatic query =====
Line 141: Line 141:
   * whitelist -- a list of pre-selected words (in a separate file) which we want to see in the resulting list   * whitelist -- a list of pre-selected words (in a separate file) which we want to see in the resulting list
   * blacklist -- a list of pre-selected words (in a separate file) which we want to exclude from the resulting list   * blacklist -- a list of pre-selected words (in a separate file) which we want to exclude from the resulting list
- 
  
 Among the output option settings we can find a selection of either the absolute [[en:pojmy:frekvence|frequency]], [[en:pojmy:arf|ARF]] or a document count. Furthermore there is also the possibility to choose a specific output attribute (or attributes). These attributes **need not be** identical to the positional attribute selected in the top section of the form, on which all the above mentioned filters are applied. This enables us to create e.g. a frequency list of all verbs by selecting the attribute [[en:pojmy:tag|tag]] in the top section, applying the condition for a verb as in [[en:seznamy:tagy#pozice_1_-_slovni_druh|V.*]] and finally by "switching" the output type to [[en:pojmy:lemma|lemma]] – an example of such a query is shown in the picture. Among the output option settings we can find a selection of either the absolute [[en:pojmy:frekvence|frequency]], [[en:pojmy:arf|ARF]] or a document count. Furthermore there is also the possibility to choose a specific output attribute (or attributes). These attributes **need not be** identical to the positional attribute selected in the top section of the form, on which all the above mentioned filters are applied. This enables us to create e.g. a frequency list of all verbs by selecting the attribute [[en:pojmy:tag|tag]] in the top section, applying the condition for a verb as in [[en:seznamy:tagy#pozice_1_-_slovni_druh|V.*]] and finally by "switching" the output type to [[en:pojmy:lemma|lemma]] – an example of such a query is shown in the picture.
 +
 +===== Keyword analysis =====
 +
 +[{{ :en:manualy:kontext:analyza_k_slov_en.png?direct&400|List of keywords in the ORAL v1 corpus compared to the SYN2020 reference corpus}}]
 +
 +The KonText interface can generate an inventory of keywords, i.e., forms or lemmas appearing in the selected (sub)corpus significantly more often than in the reference (sub)corpus, reflecting common language usage. (The analysis of keywords in the users’ texts is enabled by [[en:manualy:kwords|the specialized KWords application]].)
 +
 +Besides the corpus in which we want to find the terms in question, we also have to specify a reference corpus (or a subcorpus, e.g. if we want to compare a corpus consisting mainly of journalistic texts, such as one of SYN corpora, with a subcorpus of fiction texts: SYN2020-BEL). Next, we specify by which positional attribute the terms should be searched, by which metric they should be sorted (Log-likelihood, Chi-square, or Difference index), and possibly we also specify the desired minimum or maximum frequency. Sought-after terms can further be filtered using a regular expression; the default .* expression will display all results (or, rather, the first 1000 occurrences).
 +
 +The resulting list of keywords in the form of a table is sorted according to the selected metric, with the remaining two also displayed, followed in the next columns by the absolute and relative frequency values in both corpora. The list of found keywords can be viewed in both corpora in the respective concordance through the positive filter (<fc #4682b4>p</fc> to the right of the absolute frequency value).
 +
 +
  
 ===== Recent queries ===== ===== Recent queries =====
  
-The item displays an overview of the most recent queries used (a simplified list of previous queries is also accessible directly from the query form, via a link above the input line). These queries can be filtered according to the query type or the currently used corpus, and only archived queries can be viewed as well. By clicking on the link //Edit and search//, we paste a previously specified constraints into the query form and we may either use it without any changes, or we may modify it further (e.g. change the corpus in which the query will be used, the query type, or we may specify the context). By clicking on the //Archive// option, we can name the query and permanently save it to the query history archive for later reuse.+The item displays an overview of the most recent queries used (a simplified list of previous queries is also accessible directly from the query form, via a link above the input line). These queries can be filtered according to the query type or the currently used corpus, and only archived queries can be viewed as well. By clicking on the link //Edit and search//, we paste a previously specified constraints into the query form and we may either use it without any changes, or we may modify it further (e.g. change the corpus in which the query will be used, the query type, or we may specify the context).  
 + 
 +By clicking on the gear and then on the **Archive** option, we can name the query and permanently save it to the query history archive for later reuse. The complete status of the query form is saved, e.g. also the selected text types.