AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:manualy:kontext:novy_dotaz [2018/08/06 10:52] – [Word list] Michal Škrabalen:manualy:kontext:novy_dotaz [2024/02/12 15:58] (current) Jan Kocek
Line 1: Line 1:
 ====== Menu: Query ====== ====== Menu: Query ======
  
-[{{:en:manualy:kontext:novy_dotaz_en.png?direct&300|Form for creating a query}}]+The primary querying method in the corpus is a syntagmatic query, which results in a [[en:pojmy:konkordance|concordance]], i.e. a list of all occurrences ([[en:pojmy:token|tokens]]) corresponding to the query together with their immediate context. For syntagmatic query, use the **Query → Concordance** option.
  
-With the selection **Query → New query**, it is always possible to begin a new corpus search. By clicking on this option we abandon our previous query and any results produced with it, and we begin with a new search. The following text primarily deals with creating queries in single language corpora; the specifics of searching the parallel corpora [[en:cnk:intercorp|InterCorp]] are described in detail in the [[en:kurz:hledani_v_paralelnim_korpusu#paralelni_korpusy_v_rozhrani_kontext|bonus tutorial]] of the basic course on working with the CNC (Czech National Corpus).+An extension of syntagmatic search is the [[en:pojmy:paradigmaticky|paradigmatic query]], which is a combination of several individual syntagmatic sub-queries and yields the intersection of their frequency distributions. Thus, the result of paradigmatic querying is the set of [[en:pojmy:typ|types]] common to //all// individual syntagmatic queries. For a paradigmatic query, use the **Query → Paradigmatic query** option.
  
-After clicking on the item New Query, a basic search menu appears for the user. Within this form it is possible to select corpus in which the search will be conducted, and the [[en:eebo:first_query#query_types|type of query]] which will be used. The query itself is inserted into the input line. Another part of the form is an interactive **international keyboard** for writing special characters (especially when searching in non-Czech texts and when inserting special characters of the Corpus Query Language, [[en:pojmy:cql|CQL]]). Previously used queries can be found under the link //Previous Queries//located above the query line.+Another additional function allows to produce list of the different words (types) that match the query, along with their [[en:pojmy:frekvence|absolute frequency]], [[en:pojmy:arf|ARF]] or the number of documents in which the query occurs. To obtain such a listuse the **Query → Word list** option.
  
-===== Corpus selection =====+ 
 +===== Concordance ===== 
 + 
 +[{{:en:manualy:kontext:novy_dotaz_en.png?direct&350 |Form for creating a query }}] 
 + 
 +With the selection **Query → Concordance**, it is always possible to begin a new corpus search. By clicking on this option we abandon our previous query and any results produced with it, and we begin with a new search. The following text primarily deals with creating queries in single language corpora; the specifics of searching the parallel corpora [[en:cnk:intercorp|InterCorp]] are described in detail in the [[en:kurz:hledani_v_paralelnim_korpusu#paralelni_korpusy_v_rozhrani_kontext|bonus tutorial]] of the basic course on working with the CNC (Czech National Corpus). 
 + 
 +After clicking on the item Concordance, a basic search menu appears for the user. Within this form it is possible to select a corpus in which the search will be conducted and enter a query in the input line below it. You can use the switch to activate the **Advanced query** function which works with the [[en:pojmy:cql|Corpus Query Language]]. The form also includes an interactive **international keyboard** for entering special characters (especially when searching in non-Czech texts and when inserting special characters of the CQL). Previously asked questions can be found either directly in the menu or using the **Recent queries** link above the query line. The last item in the toolbar above the line is **Query interpretation**, where the user finds out how his/her query will be evaluated (i.e. translated into CQL) and whether this interpretation is in accordance with his/her intention. This function is not available when switching to advanced mode, but instead it is possible to directly insert interactively generated morphological tags (for corpora that are tagged in this way) or within conditions (see items **Insert tag** and **Insert within**). 
 + 
 +==== Corpus selection ====
  
 The selection of a [[en:pojmy:korpus|corpus]] suitable for solving the given research question is an important decision which must be made before taking any other steps in the research. The range of corpora made available through the project of the Czech National Corpus is constantly widening, as can be seen on the [[en:cnk:uvod|list of corpora]]. It has therefore been necessary to adjust the corpus selection on the KonText interface in order to accommodate their growing number. Until the autumn of 2015, the corpus selection had the shape of a hierarchically organized tree; this system had several disadvantages: from the not always definite placement of the given corpus within the hierarchy, to a great increase in the number of new corpora and their versions. As a result of this, the hierarchical organization stopped being clear and sustainable in the future, which is why we have switched to the new, label-based system. Its aim is primarily to facilitate orientation in the large number of existing corpora, while simultaneously simplifying work for those users who use only a small number of favorite corpora. The selection of a [[en:pojmy:korpus|corpus]] suitable for solving the given research question is an important decision which must be made before taking any other steps in the research. The range of corpora made available through the project of the Czech National Corpus is constantly widening, as can be seen on the [[en:cnk:uvod|list of corpora]]. It has therefore been necessary to adjust the corpus selection on the KonText interface in order to accommodate their growing number. Until the autumn of 2015, the corpus selection had the shape of a hierarchically organized tree; this system had several disadvantages: from the not always definite placement of the given corpus within the hierarchy, to a great increase in the number of new corpora and their versions. As a result of this, the hierarchical organization stopped being clear and sustainable in the future, which is why we have switched to the new, label-based system. Its aim is primarily to facilitate orientation in the large number of existing corpora, while simultaneously simplifying work for those users who use only a small number of favorite corpora.
Line 18: Line 27:
     - **All corpora** with the possibility of searching all available corpora with the aid of so-called **labels**, which are used to characterize the corpora (a typical corpus has several labels, e.g. SYN2010: written, synchronic, Czech, SYN family, representative). For example, if you are looking for a web corpus of Czech, all you have to do is select the labels "Czech" + "web", and all the relevant corpora made available by the CNC will appear. The search may be further refined by typing part of the corpus name or its description into the search bar, and the resulting list of corpora is interactively filtered based on the keywords. However, it must be noted that for spatial reasons the list only shows the first 25 items; if the list is too long, the query must be specified further with the addition of another label, or by searching for a part of its name.     - **All corpora** with the possibility of searching all available corpora with the aid of so-called **labels**, which are used to characterize the corpora (a typical corpus has several labels, e.g. SYN2010: written, synchronic, Czech, SYN family, representative). For example, if you are looking for a web corpus of Czech, all you have to do is select the labels "Czech" + "web", and all the relevant corpora made available by the CNC will appear. The search may be further refined by typing part of the corpus name or its description into the search bar, and the resulting list of corpora is interactively filtered based on the keywords. However, it must be noted that for spatial reasons the list only shows the first 25 items; if the list is too long, the query must be specified further with the addition of another label, or by searching for a part of its name.
  
-**Example**: The user searches in the tab //All corpora// for a current version of the English section of the parallel corpus [[en:cnk:intercorp|InterCorp]]. He first selects the labels "InterCorp" a "current version ", the first 25 corpora which conform to the specified conditions appear on the list, although InterCorp contains many more languages. The corpora not displayed can be accessed with the help of further filtering, for example by typing part of the corpus name or language (please note that the names of the individual InterCorp versions are in English). After finding the desired corpus and clicking on it, the corpus becomes the current corpus for searching, and it is at the same time possible to mark it as favourite with a star. The corpus is added to the list of favourite corpora and can be accessed quickly and easily with a single click.+**Example**: The user searches in the //All corpora// tab for a current version of the English section of the parallel corpus [[en:cnk:intercorp|InterCorp]]. He first selects the labels "InterCorp" a "current version ", the first 25 corpora which conform to the specified conditions appear on the list, although InterCorp contains many more languages. The corpora not displayed can be accessed with the help of further filtering, for example by typing part of the corpus name or language (please note that the names of the individual InterCorp versions are in English). After finding the desired corpus and clicking on it, the corpus becomes the current corpus for searching, and it is at the same time possible to mark it as favourite with a star. The corpus is added to the list of favourite corpora and can be accessed quickly and easily with a single click.
  
-===== Type of Query =====+==== Query types ====
  
-**Query types** and their uses+There are two query types in the current version of KonText: **simple** a **advanced**.
  
-^ Query type ^ What it’s for ^ How it works ^ What it does ^ Examples ^ +<WRAP round tip 70%> 
-^ Basic Query | for familiarization with the corpus| Searches for the expression as a node form regardless of case; in the case of a dictionary form (lemma), all of its possible forms are also searched for. | without [[en:pojmy:regularni_vyrazy|regular expressions]] (RE), [[wp>Case_sensitivity|case-insensitive]] | ''old house''//old house, older houses, oldest house…//\\ ''this time''//this time// | +Previous versions of KonText worked with six query types: //basic////lemma//, //phrase//, //word form////character//, and //CQL//. The current //simple// query covers the first five query typesas their functionality can be achieved by altering the simple query settings, e.g. the default attribute and/or regular expressions (see below). The current //advanced// query fully corresponds to the CQL query type. 
-^ Lemma | for the analysis of an entire paradigm/lexeme | Finds all forms associated with the given [[en:pojmy:lemma|lemma]]. | RE (it is possible to use regular expressions), case-sensitive, possibility of specifying word class | ''see''//see, saw, seenseeing…//\\ ''new''//newnewer, newest…// +</WRAP>
-^ Phrase | for a multiword combination in the given form | Finds the exact wording of a phrase. | RE, case-sensitive | ''black dog''//black dog//\\ ''new car''//new car//\\ ''newer car'' > //newer car// | +
-^ Node form | for the analysis of one specific form | Finds the exact form. | REcase-in/sensitive (possible to select Match case) | ''cat''//cat//\\ ''cats'' > //cats//\\ ''cat.*''//cat, cats, Cats, CATS…// +
-^ Character | searching for a string of characters anywhere within a word | Finds consecutive characters within the scope of one word. | REcase-sensitive | ''pre'' > //prepare, supreme, appreciate…//\\ ''str'' > //industrial, strong, orchestra…//+
-^ CQL | searching for anything that can be found with the corpus manager | CQL is [[en:pojmy:dotazovaci_jazyk|Corpus Query Language]] (into which the KonText interface internally converts all of the previous types of queries). | RE, case-sensitive, [[en:pojmy:dotazovaci_jazyk|CQL]] syntax | ''[lemma=<nowiki>"</nowiki>see<nowiki>"</nowiki>]''//see, saw, seen, seeing…//\\ ''[word=<nowiki>"</nowiki>black<nowiki>"</nowiki>]'' > //black//\\  ''[lemma=<nowiki>"</nowiki>read<nowiki>"</nowiki>][tag=<nowiki>"</nowiki>N.*<nowiki>"</nowiki>]'' > //reading books, read something…// |+
  
 +The default setting is the **simple query** with case-insensitive matching (the **Match case** switch is off), without regular expressions (the **Allow regular expressions** switch is off) and with the **default attribute** set to ''lemma|word'' (''lemma|sublemma|word'' in SYN2020). The latter setting denotes searching not only for the input word form (given by the ''word'' attribute), but also other word forms subsumed under ''lemma'' or ''sublemma'', provided the input word can also be interpreted as a lemma or sublemma (remark: this is exactly the behaviour of the basic query of the previous versions of KonText, that has only been extended to sublemma). Apart from the individual words, it is also possible to input multi-word phrases. The search can be further specified either by using the add-on for suggesting variants (SYN2020 only, see next section), or by changing the default attribute, and/or toggling the case-sensitivity switch. Furthermore, regular expressions can be used in the simple query mode after toggling the Allow regular expressions switch.
  
 +**Advanced query** mode is activated by a switch above the input line. This mode fully corresponds to the CQL query mode of the previous versions of KonText. When entering a [[en:pojmy:cql|CQL]] query, KonText automatically checks and highlights the query syntax. If the syntax is not valid, it notifies the user and lets them edit the query. Since CQL is quite complex, the syntax checking may occasionally issue a warning also in case of a valid query.
  
-**Corpus selection** and **query type** can influence what the form looks like:  +Once the query has been entered, the search can be started either by clicking on the **Search** button, or by pressing the Enter key (provided the focus is on the input line).
-  - Corpora which are not lemmatized, i.e. do not offer //lemma// as a [[en:eebo:first_query#query_types|query type]]. +
-  - Some query types (only those where it makes sense) allow for the user to specify whether the query should be assessed with respect to capitalization ([[wp>Case_sensitivity|case-sensitive]]), or without considering upper/lower case ([[wp>Case_sensitivity|case-insensitive]]). +
-  - In the case of the [[en:eebo:first_query#query_types|query types]] [[en:pojmy:lemma|lemma]] and [[en:pojmy:word]] it is also possible to specify word class (position attribute [[en:pojmy:pos|pos]]). +
-  - The [[en:pojmy:dotazovaci_jazyk|CQL]] query type also allows for the insertion of interactively generated [[en:pojmy:tag|morphological tags]] (with corpora which are tagged in this way) or conditions specifying texts in which the search is to be carried out (the condition [[en:pojmy:within]])+
-  - A very specific way of inputting queries for [[en:kurz:hledani_v_paralelnim_korpusu|searches in parallel corpora]].+
  
-Once the query has been typed, we may begin the search either by clicking on the **Search** button, or by pressing the Enter key, if the cursor is in the input line.+=== Query evaluation ===
  
-Every query can be specified further for the context in which the term is found and documents in which we want to search.+If the search is successful, the concordance list page is displayed. Its use is described in detail on the [[en:manualy:kontext:konkordance|Concordance]] page.
  
-===== Specify context =====+==== Search suggestions ==== 
 + 
 +[{{ :en:manualy:kontext:dotaz_naseptavac_brejli.png?direct&400|Search suggestions}}] 
 + 
 +For corpora with two-level lemmatization (starting with [[en:cnk:syn2020|SYN2020]] and [[en:cnk:syn:verze9|SYN version 9]]), KonText can suggest alternative ways of formulating a given query. Query terms with suggestions available are indicated by a blue background and a little question mark next to them. Ctrl/Command-clicking on such a term pops up a menu of lemmas, sublemmas and lc (case-insensitive forms), from which the most appropriate interpretation can be selected. After tweaking a query term in this way, its background changes to red and the **Query interpretation** option above the input line is highlighted. 
 + 
 +For instance, when a user types in the form //brejlí//, KonText notifies them that this form is annotated as two different lemmas in SYN2020 (//brejlit// and //brýle//), and furthermore, it shows that lemma //brýle// includes two sublemmas, namely stylistic variants //brýle// and //brejle//. Thus, the user has the option to modify their query to search for either (i) all forms matching the lemma in the chosen row (by picking an option in the **lemma** column, e.g. //brýle//), or (ii) forms matching only a specific sublemma within the lemma (cf. the **sublemma** column, e.g. //brejle//), or (iii) only a concrete case-insensitive form within the lemma (cf. the **lc** column, e.g. //brejlí// under the //brýle// lemma). Similarly, KonText notifies the user if lemmas differ only in terms of case, e.g. //Procházka// (common surname) vs. //procházka// (a walk). 
 + 
 +==== Specify parameters ==== 
 + 
 +As mentioned above, one can also specify additional parameters that influence the interpretation of a query. Apart from the default positional attribute, there are two more switches available in the simple query mode: case-sensitivity and allowing the use of regular expressions in a query. 
 + 
 +==== Specify context ====
  
 [{{ :en:manualy:kontext:hledani_kontext.png?direct&300|Form for searching in context}}] [{{ :en:manualy:kontext:hledani_kontext.png?direct&300|Form for searching in context}}]
Line 51: Line 65:
 Every query can be further specified based on the context (the surrounding text) in which the search word or phrase can appear. The context menu, which is used for the specification of a query, can be found in the bottom section of the query form (it is hidden in the basic settings and it is necessary to activate it by clicking on **Specify context**). Every query can be further specified based on the context (the surrounding text) in which the search word or phrase can appear. The context menu, which is used for the specification of a query, can be found in the bottom section of the query form (it is hidden in the basic settings and it is necessary to activate it by clicking on **Specify context**).
  
-Searching the context essentially means additionally [[en:manualy:kontext:filtr|filtering]] the basic concordance which is specified by the query in the main part of the form. The user can set the span of the context to which the additional filter condition will be applied, the query type, or word class.+Searching the context essentially means additionally [[en:manualy:kontext:filtr|filtering]] the basic concordance which is specified by the query already in the query form. The user can set the span of the context to which the additional filter condition will be applied, particular lemma(s), or word class(es).
  
-In general it can be said that any given search in context can be rewritten as an ordinary query which is then filtered (with the aid of positive and negative filters). Any filtering can also be carried out with the help of [[en:pojmy:dotazovaci_jazyk|query language]] , performing the identical operation in a single step. The reality is that there are always more ways to achieve the same result, and it is entirely up to the user to decide which option he is most comfortable with.+In generalit can be said that any given search in context can be rewritten as an ordinary query which is then filtered (with the aid of positive and negative filters). Any filtering can also be carried out with the help of [[en:pojmy:dotazovaci_jazyk|query language]], performing the identical operation in a single step. The reality is that there are always more ways to achieve the same result, and it is entirely up to the user to decide which option he/she is most comfortable with.
  
-===== Specify query according to the meta-information =====+==== Restrict search ====
  
-If we need to search only in a narrowly defined group of texts in the entire corpus, we have two options. Either we create our own virtual [[en:manualy:kontext:subcorpus]], which we will then be able to select within the offered corpora, or we can restrict the query with a number of conditions (typically with the command [[en:pojmy:within|within]]). As a rule, we choose the first option in situations where we know that we will be needing the subcorpus for a longer duration of timeor when its specification is complex. We use the second option when conducting ad hoc searches within some clearly defined text categories, which are specified with the help of the basic [[en:pojmy:atributy_strukturni|structural attributes]].+We have two options if we need to search only in a narrowly defined group of texts in the entire corpus. Either we create our own virtual [[en:manualy:kontext:subcorpus]], which we will then be able to select within the offered corpora, or we can restrict the query with several conditions (typically with the command [[en:pojmy:within|within]]). As a rule, we choose the first option when we know that we will need the subcorpus for a longer time or when its specification is complex. We use the second option when conducting ad hoc searches within some clearly defined text categories specified with the help of the basic [[en:pojmy:atributy_strukturni|structural attributes]].
  
-The New Query form allows for simplification by way of an additional form // Specify query according to the meta-information //, which is located underneath the context search and is activated with a click, similar to the (above mentioned) context specification.+The query form allows for simplification by way of an additional //Restrict search// tab located underneath the context search and activated with a click, similar to the (above-mentioned) context specification.
  
-[{{:en:manualy:kontext:hledani_subkorpus.png?direct&300|Form for searching in a subcorpus created ad hoc}}]+[{{:en:manualy:kontext:hledani_subkorpus_en.png?direct&300|Form for searching in a subcorpus created ad hoc }}] 
  
  
-Within this form it is possible to mark off the values of selected structural attributes that interest us. The form does not contain all structural attributes, but only those most often used in the given corpus (e.g. when searching in the [[en:cnk:syn2010|SYN2010]] it is [[en:pojmy:txtype_group|txtype_group]], [[en:pojmy:txtype|txtype]], [[en:pojmy:genre|genre]], [[en:pojmy:medium|med]], [[en:pojmy:srclang|srclang]]). The abbreviations used can be found in the [[en:seznamy:index|lists]] section.+Within this form it is possible to mark off the values of selected structural attributes that interest us. The form does not contain all structural attributes, but only those most often used in the given corpus (e.g. when searching in the [[en:cnk:syn2020|SYN2020]] it is [[en:pojmy:txtype_group|txtype_group]], [[en:pojmy:txtype|txtype]], [[en:pojmy:genre|genre]], [[en:pojmy:srclang|srclang]]). The abbreviations used can be found in the [[en:seznamy:index|lists]] section.
  
-In the final column we can find a list of the specific [[en:pojmy:opus|opuses]] or [[en:pojmy:doc|documents]] (based on the selected corpus), which correspond to the specified condition. If such a list would be too long, the given column contains only the number of items. If we select some categories from the menu, we can view an inventory of texts which meet the given conditions with the help of the button **refine selection** (bottom left). The column containing the list of texts is recalculated according to the currently marked criteria. We can continue in this way until we are satisfied with the demarcation of the data that we want to use for our search.+In the final column we can find a list of the specific [[en:pojmy:opus|opuses]] or [[en:pojmy:doc|documents]] (based on the selected corpus), which correspond to the specified condition. If such a list is too long, the given column contains only the number of items. If we select some categories from the menu, we can view an inventory of texts which meet the given conditions with the help of the button **Refine selection**. The column containing the list of texts is recalculated according to the currently marked criteria. We can continue this way until we are satisfied with the demarcation of the data we want to use for our search. It is possible to go back (option **Undo**) or cancel the entire selection (option **Reset selection**). The selection can also be saved for later use  (option **Save as a subcorpus draft**), creating a new virtual [[en:pojmy:subkorpus|subcorpus]]. Furthermore, a list of documents in the current selection can be easily retrieved (option **Save a list of documents**), which can be handy, e.g. if you want to find out which fiction books are in the parallel InterCorp corpus for a given language(s)
  
-For a more detailed specification it is necessary to either use the condition [[en:pojmy:within|within]] inside a [[en:kurz:pokrocile_dotazy#dotazovaci_jazyk|CQL]] query, or to create a new virtual [[en:pojmy:subkorpus|subcorpus]].+For a more detailed specificationit is necessary to use the condition [[en:pojmy:within|within]] inside a [[en:kurz:pokrocile_dotazy#dotazovaci_jazyk|CQL]] query.
  
-====== Recent queries ======+===== Paradigmatic query =====
  
-The item displays an overview of the most recent queries used (a simplified list of previous queries is also accessible directly from the query form, via a link above the input line)These queries can be filtered according to the query type or the currently used corpus, and only archived queries can be viewed as well. By clicking on the link //Edit and search//, we paste a previously specified constraints into the query form and we may either use it without any changes, or we may modify it further (e.g. change the corpus in which the query will be used, the query type, or we may specify the context). By clicking on the //Archive// option, we can name the query and permanently save it to the query history archive for later reuse.+[{{ :en:manualy:kontext:paradigmaticky_dotaz_en.png?direct&400| Paradigmatic query }}]
  
-====== Word list ======+In addition to the syntagmatic query described above (where we search for the set of [[en:pojmy:token|tokens]] matching a query and display them as [[en:pojmy:kwic|KWIC]]s in the form of a [[en:pojmy:konkordance|concordance]]), we can also use a [[en:pojmy:paradigmaticky|paradigmatic query]]. This search actually combines several individual syntagmatic sub-queries and returns the intersection of their frequency distributions. The result of paradigmatic querying is thus the set of [[en:pojmy:typ|types]] that match //all// specified syntagmatic queries. 
 + 
 +[{{ :en:manualy:kontext:paradigma_vysledek_en.png?direct&400| Results of paradigmatic query }}] 
 + 
 +In the query form, partial syntagmatic sub-queries should be entered in separate boxes (additional boxes can be added using the + button at the bottom or removed by clicking on the trashcan icon on the right). One can also specify parameters such as the default attribute, the minimum frequency of each syntagmatic query, and the position at which the frequency distribution will be applied to each of them. 
 + 
 +By default, the resulting inventory of words matching all queries (i.e. the intersection of individual sub-queries) is sorted by the last column (total absolute frequency). The sorting can be changed by clicking on any column header (absolute frequency of each query). The horizontal order of the partial frequencies is indicated by color coding.  
 + 
 +== Example == 
 + 
 +Searching the [[en:cnk:syn2020|SYN2020]] corpus for all lemmas whose forms sometimes end in //-ý//, sometimes in //-á//, and sometimes in //-é//, we can recover (albeit not 100% exactly) the group of adjectives inflected following the //mladý// paradigm (hard declension) whose nominative singular forms for all three genders (//mladý//, //mladá//, and //mladé//) can be found in the corpus. Partial syntagmatic sub-queries may look like this (these can only be entered using [[en:pojmy:cql|CQL]]): 
 + 
 +  - ''%%[word=".+ý"]%%'' 
 +  -  ''%%[word=".+á"]%%'' 
 +  - ''%%[word=".+é"]%%'' 
 + 
 +Attribute (generalization level): [[en:pojmy:lemma|lemma]]\\ 
 +Realization minimum frequency: 4 
 + 
 +In the results, we find adjectives such as //nový//, //celý//, //velký//, pronouns and numerals with adjective declension //každý//, //druhý//, //který//, but also words with another declension type that only show formal similarity to the query, e.g. the pronoun //svůj// (whose paradigm includes, among others, the forms //svý//, //svá//, and //své//). On the other hand, we will not find lemmas for which one of the forms is missing or too sparsely documented in the corpus (e.g. //šestatřicetiletý//, as its form //šestatřicetileté// only occurs three times in SYN2020, which is below the minimum frequency limit). 
 + 
 +==== The "never" and "always" conditions ==== 
 + 
 +Paradigmatic queries can be further refined using the conditions **never** (//Exclude//) and **always** (//Restrict search to//), which can be selected using the drop-down menu above each sub-query. 
 + 
 +The **never** condition allows you to **exclude** cases matched by the specified sub-query. If we were to add a fourth sub-query to the above example with the specification that it is a "never" condition, namely: 
 + 
 +**4.** ''%%[word=".+í"]%%'' 
 + 
 +we will find only such lemmas with adjective declension that do not typically form the nominative plural, such as //každý//, //celkový//, //dostatečný//, for which the forms //*každí//, //*celkoví// a //*dostateční// are not attested with above-limit frequency. 
 + 
 +The **always** condition defines a superset of types from which only those fully specified by the sub-queries (i.e., those having no occurrences outside of those queries and outside of the specified superset) are selected. Thus, the results cannot contain types that have realizations which are not matched by at least one sub-query. If we add a fourth condition to the example above with the specification **Restrict search to** (the "always" condition) and the value 
 + 
 +**4.** ''%%[lemma=".+ý"]%%'' 
 + 
 +we find only those lemmas ending in //-ý// that are entirely determined by conditions 1-3, i.e., they have no other realizations that remain unfound by these sub-queries. This corresponds to the lemma //odmaštěný//, which only occurs in SYN2020 in the forms //odmaštěný//, //odmaštěná//, //odmaštěné//, and all of them have at least four occurrences. 
 + 
 +The "always" and "never" conditions can be applied strictly, or their effect can be mitigated by putting a maximum percentage of exceptions in the results (the field //max. ratio of exceptions// above the query box). For example, suppose we are looking for words that //never// occur in the imperative. By increasing the percentage of exceptions to 1% (the value needs to be entered in the form 0.01), we can include verbs in which the imperative is represented by at most one percent of their forms.  
 + 
 +===== Word list =====
  
 The basic output of any query is a [[en:pojmy:konkordance|concordance]], i.e. a list of all the occurrences ([[en:pojmy:token|tokens]]) matching the query, along with their text surroundings. The **Word list** function evaluates the query in such a way that the result is a list of various words ([[en:pojmy:typ|types]]), matching the query, together with their absolute [[en:pojmy:frekvence|frequency]], [[en:pojmy:arf|ARF]] or number of documents in which the wanted phenomenon occurs. In this respect, the Word list function is analogous to [[en:manualy:kontext:frekvencni_distribuce|frequency distribution]], however its advantage is its speed and low computational complexity, because the extra step involving the concordance is not needed with the Word list. The basic output of any query is a [[en:pojmy:konkordance|concordance]], i.e. a list of all the occurrences ([[en:pojmy:token|tokens]]) matching the query, along with their text surroundings. The **Word list** function evaluates the query in such a way that the result is a list of various words ([[en:pojmy:typ|types]]), matching the query, together with their absolute [[en:pojmy:frekvence|frequency]], [[en:pojmy:arf|ARF]] or number of documents in which the wanted phenomenon occurs. In this respect, the Word list function is analogous to [[en:manualy:kontext:frekvencni_distribuce|frequency distribution]], however its advantage is its speed and low computational complexity, because the extra step involving the concordance is not needed with the Word list.
  
-[{{ en:manualy:kontext:seznam_slov_slovesa.png?direct&300|Form for creating word lists}}]+[{{ en:manualy:kontext:seznam_slov_slovesa_en.png?direct&300|Form for creating word lists }}]
  
 Various search parameters can be set in the form: Various search parameters can be set in the form:
Line 85: Line 138:
   * RE pattern (regular expression), to which the resulting words must correspond (if it is not submitted, the list will contain all items in the corpus if they fulfill the other specifications in the form)   * RE pattern (regular expression), to which the resulting words must correspond (if it is not submitted, the list will contain all items in the corpus if they fulfill the other specifications in the form)
   * minimum frequency   * minimum frequency
-  * whitelist – a list of pre-selected words (in a separate file) which we want to see in the resulting list 
-  * blacklist -- a list of pre-selected words (in a separate file) which we want to exclude from the resulting list 
   * option "Include non-words", which widens the search to words which are not composed only of alphabetic characters   * option "Include non-words", which widens the search to words which are not composed only of alphabetic characters
 +  * whitelist -- a list of pre-selected words (in a separate file) which we want to see in the resulting list
 +  * blacklist -- a list of pre-selected words (in a separate file) which we want to exclude from the resulting list
  
 Among the output option settings we can find a selection of either the absolute [[en:pojmy:frekvence|frequency]], [[en:pojmy:arf|ARF]] or a document count. Furthermore there is also the possibility to choose a specific output attribute (or attributes). These attributes **need not be** identical to the positional attribute selected in the top section of the form, on which all the above mentioned filters are applied. This enables us to create e.g. a frequency list of all verbs by selecting the attribute [[en:pojmy:tag|tag]] in the top section, applying the condition for a verb as in [[en:seznamy:tagy#pozice_1_-_slovni_druh|V.*]] and finally by "switching" the output type to [[en:pojmy:lemma|lemma]] – an example of such a query is shown in the picture. Among the output option settings we can find a selection of either the absolute [[en:pojmy:frekvence|frequency]], [[en:pojmy:arf|ARF]] or a document count. Furthermore there is also the possibility to choose a specific output attribute (or attributes). These attributes **need not be** identical to the positional attribute selected in the top section of the form, on which all the above mentioned filters are applied. This enables us to create e.g. a frequency list of all verbs by selecting the attribute [[en:pojmy:tag|tag]] in the top section, applying the condition for a verb as in [[en:seznamy:tagy#pozice_1_-_slovni_druh|V.*]] and finally by "switching" the output type to [[en:pojmy:lemma|lemma]] – an example of such a query is shown in the picture.
 +
 +===== Keyword analysis =====
 +
 +[{{ :en:manualy:kontext:analyza_k_slov_en.png?direct&400|List of keywords in the ORAL v1 corpus compared to the SYN2020 reference corpus}}]
 +
 +The KonText interface can generate an inventory of keywords, i.e., forms or lemmas appearing in the selected (sub)corpus significantly more often than in the reference (sub)corpus, reflecting common language usage. (The analysis of keywords in the users’ texts is enabled by [[en:manualy:kwords|the specialized KWords application]].)
 +
 +Besides the corpus in which we want to find the terms in question, we also have to specify a reference corpus (or a subcorpus, e.g. if we want to compare a corpus consisting mainly of journalistic texts, such as one of SYN corpora, with a subcorpus of fiction texts: SYN2020-BEL). Next, we specify by which positional attribute the terms should be searched, by which metric they should be sorted (Log-likelihood, Chi-square, or Difference index), and possibly we also specify the desired minimum or maximum frequency. Sought-after terms can further be filtered using a regular expression; the default .* expression will display all results (or, rather, the first 1000 occurrences).
 +
 +The resulting list of keywords in the form of a table is sorted according to the selected metric, with the remaining two also displayed, followed in the next columns by the absolute and relative frequency values in both corpora. The list of found keywords can be viewed in both corpora in the respective concordance through the positive filter (<fc #4682b4>p</fc> to the right of the absolute frequency value).
 +
 +
 +
 +===== Recent queries =====
 +
 +The item displays an overview of the most recent queries used (a simplified list of previous queries is also accessible directly from the query form, via a link above the input line). These queries can be filtered according to the query type or the currently used corpus, and only archived queries can be viewed as well. By clicking on the link //Edit and search//, we paste a previously specified constraints into the query form and we may either use it without any changes, or we may modify it further (e.g. change the corpus in which the query will be used, the query type, or we may specify the context). 
 +
 +By clicking on the gear and then on the **Archive** option, we can name the query and permanently save it to the query history archive for later reuse. The complete status of the query form is saved, e.g. also the selected text types.
 +
 +
  
 ---- ----