AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:kurz:hledani_v_paralelnim_korpusu [2019/12/19 23:42] – [Access to the corpus] Alexandr Rosenen:kurz:hledani_v_paralelnim_korpusu [2022/11/23 15:09] (current) – [Entering a query] Alexandr Rosen
Line 4: Line 4:
  
  
-====Main differences in comparison with the Park interface==== 
- 
-  * Same environment for searching both monolingual and parallel corpora 
-  * Faster response, less prone to error conditions 
-  * More features for processing query results (sorting, frequency distribution, collocations) 
-  * An option to display results even when the equivalent is missing in one or more of queried languages (**include empty lines**) 
-  * The corpus size is measured in the number of positions (words including punctuation), not words 
-  * The number of results indicates the number of positions that satisfy the query, while in Park it was the number of segments 
-  * Unless the range of texts is not restricted by the user, the search is performed in all texts, including collections (not only in the core of corpus) 
-  * The search results are shuffled, which may result in longer response time. If this is not desirable, the user may ask for displaing query results according to the order of texts in the corpus; this order puts all texts in the core before collections. To do it, just uncheck **View/General concordance view options/Other options/Shuffle concordance lines by default**. 
-  * The user can create and save their own subcorpus. 
-  * Query results cannot be displayed horizontally, with each language in a separate row rather than in a column 
  
 ====Access to the corpus==== ====Access to the corpus====
Line 22: Line 10:
  
 [[https://kontext.korpus.cz/first_form|KonText]] is an integrated interface for searching both monolingual and parallel corpora. After entering your user ID and password a page with a default corpus opens (e.g. **syn2015**). After clicking **All corpora** and **InterCorp** a list of  available languages shows up.  [[https://kontext.korpus.cz/first_form|KonText]] is an integrated interface for searching both monolingual and parallel corpora. After entering your user ID and password a page with a default corpus opens (e.g. **syn2015**). After clicking **All corpora** and **InterCorp** a list of  available languages shows up. 
 +
 +For a detailed description of KonText and its features see the [[https://wiki.korpus.cz/doku.php/en:manualy:kontext:index|KonText interface manual]]. 
 +Here you find only a few basic hints and specifics of using KonText to query InterCorp.
  
 ====Selecting languages==== ====Selecting languages====
  
-Click on one of the languages, such as **InterCorp v9 Czech** to choose the primary language for your search. For the primary language a non-empty query is required. The query box for this language must be filled in. The order of the languages also matters when you wish to create a subcorpus (see below). The range of texts to create subcorpora can be specified only for the primary language. In other respects, the order of languages is irrelevant.+Click on one of the languages, such as **InterCorp v15 Czech** to choose the primary language for your search. For the primary language a non-empty query is required. The query box for this language must be filled in. The order of the languages also matters when you wish to create a subcorpus (see below). The range of texts to create subcorpora can be specified only for the primary language. In other respects, the order of languages is irrelevant.
  
 After choosing the primary language a brief description of the selected part of the corpus appears in the page heading together with its size, measured in the number of tokens (so-called positions, i.e. word forms and punctuation symbols). To add an additional language choose the relevant corpus part within the frame **Aligned corpora** and then click on **Add**. For the additional language a query need not be entered. Tick **include empty lines** if you wish the result to include concordances that do not have an equivalent in the given language. More languages can be added in a similar way. Searching one part of the parallel corpus only, i.e. within a single language, is also possible. If so, do not add other languages and proceed to selecting the type of query and specifying the query itself. After choosing the primary language a brief description of the selected part of the corpus appears in the page heading together with its size, measured in the number of tokens (so-called positions, i.e. word forms and punctuation symbols). To add an additional language choose the relevant corpus part within the frame **Aligned corpora** and then click on **Add**. For the additional language a query need not be entered. Tick **include empty lines** if you wish the result to include concordances that do not have an equivalent in the given language. More languages can be added in a similar way. Searching one part of the parallel corpus only, i.e. within a single language, is also possible. If so, do not add other languages and proceed to selecting the type of query and specifying the query itself.
Line 31: Line 22:
 ====Entering a query==== ====Entering a query====
  
-You can choose from six **Query Types** (see below)All types of queries except **Basic** are case-sensitive and can handle regular expressions. For the query type **Word Form** the default is case-insensitive but **Match case** can be turned on. For the second and other languages you can also specify whether the concordances should or should not include terms specified in the query box. +You can switch between the simple and **Advanced Query** optionsIn the Advanced Query option you can use the [[en:pojmy:cql|Corpus Query Language]]. Using the **CQL** language you can search for one or more word forms according to the given expression. While entering morphological tags for Czech the user might find useful the helper option **insert tag**, which allows to enter codes at the appropriate position of the tag using a menu of attributes and their corresponding values. All languages include the **insert "within"** option, which helps to filter the query results according to metadata, ie bibliographic and other data relating to the texts. For a list of attributes and their values, see [[http://ucnk.ff.cuni.cz/intercorp/?req=page:metadata&lang=en|here]]. The 'attribute="value"' pairs can be combined using the operator & (logical conjunction). The whole "within" condition must be placed at the end of a query, following expressions specifying one or more positions (in brackets). A single query can include multiple "within" conditions. The following two example queries produce identical results, namely sentences including nouns in the vocative case in original Czech dramas:
- +
-  * **Basic** - searches for the given word form, case-insensitive, if the given form is at the same time a basic dictionary form ([[en:pojmy:lemma|lemma]]), searches also for all of its inflected forms +
-  * **Lemma** - searches for all forms of the given lemma +
-  * **Phrase** - searches for the given sequence of word forms +
-  * **Word form** - searches for the given word form +
-  * **Character** - search for word forms containing the given sequence of characters +
-  * **CQL** - searches for one or more word forms according to the given expression in the [[https://www.sketchengi.co.uk/corpus-querying/|**CQL**]] query language. While entering morphological tags for Czech the user might find useful the helper option **insert tag**, which allows to enter codes at the appropriate position of the tag using a menu of attributes and their corresponding values. All languages include the **insert "within"** option, which helps to filter the query results according to metadata, ie bibliographic and other data relating to the texts. For a list of attributes and their values, see [[http://ucnk.ff.cuni.cz/intercorp/?req=page:metadata&lang=en|here]]. The 'attribute="value"' pairs can be combined using the operator & (logical conjunction). The whole "within" condition must be placed at the end of a query, following expressions specifying one or more positions (in brackets). A single query can include multiple "within" conditions. The following two example queries produce identical results, namely sentences including nouns in the vocative case in original Czech dramas:+
  
 <code> <code>