An introduction to searching InterCorp using the NoSketch Engine interface

We make an effort to keep the following instructions up-to-date. However the search interface NoSketch Engine is being developed and inconsistencies may occur. We are sorry about any such problems. Please report such cases and ask questions using the project coordinator email at Contacts.

Main differences in comparison with the Park interface

  • Same environment for searching both monolingual and parallel corpora
  • Faster response, less prone to error conditions
  • More features for processing query results (sorting, frequency distribution, collocations)
  • An option to display results even when the equivalent is missing in one or more of queried languages (include empty lines)
  • The corpus size is measured in the number of positions (words including punctuation), not words
  • The number of results indicates the number of positions that satisfy the query, while in Park it is the number of segments
  • Unless the range of texts is not restricted by the user, the search is performed in all texts, including collections (not only in the core of corpus)
  • Unless a different order is specified by the user, the search results are sorted according to the order of texts in the corpus; this order puts all texts in the core before collections
  • Texts to be queried are selected by creating a subcorpus
  • The displayed list of texts to be queried does not yet reflect criteria restricting the range of the texts
  • Query results cannot be exported directly into a spreadsheet (xls) file yet; however, a concordance file created by the Save function using the Text format can be imported into a spreadsheet; the character encoding is UTF-8 and the column separator is a tab
  • Query results cannot be displayed horizontally, with each language in a separate row rather than in a column   
  • Only the current version of the corpus can be searched, not the previous one

Access to the corpus

InterCorp is accessible using the same login as for other CNC corpora. If you do not have a login for the CNC corpora yet you can get it for free by registering on the Registration page.

NoSketch Engine is an integrated interface for searching both monolingual and parallel corpora. After entering your user ID and password a page with a default corpus opens (e.g. syn2010). After clicking the button with the name of the default corpus a menu with available corpus types shows up. Click on the arrow next to the item Paralelní korpus InterCorp to open a list of the language-specific corpus parts.

Selecting languages

Click on one of the languages below Paralelní korpus InterCorp to choose the primary language for your search. For the primary language a non-empty query is required. The query box for this language must be filled in. The order of the languages also matters when you wish to create a subcorpus (see below). The range of texts to create subcorpora can be specified only for the primary language. In other respects, the order of languages is irrelevant.

After choosing the primary language a brief description of the selected part of the corpus appears in the page heading together with its size, measured in the number of positions (word forms and punctuation symbols). To add an additional language choose the relevant corpus part within the frame Aligned corpora and then click on Add. For the additional language a query need not be entered. Tick include empty lines if you wish the result to include concordances that do not have an equivalent in the given language. More languages can be added in a similar way. Searching one part of the parallel corpus only, i.e. within a single language, is also possible. If so, do not add other languages and proceed to selecting the type of query and specifying the query itself.

Entering a query

You can choose from six Query Types (see below). All types of queries except Basic are case-sensitive and can handle regular expressions. For the query type Word Form the default is case-insensitive but Match case can be turned on. For the second and other languages you can also specify whether the concordances should or should not include terms specified in the query box.

  • Basic - searches for the given word form, case-insensitive, if the given form is at the same time a basic dictionary form (lemma), searches also for all of its inflected forms
  • Lemma - searches for all forms of the given lemma
  • Phrase - searches for the given sequence of word forms
  • Word form - searches for the given word form
  • Character - search for word forms containing the given sequence of characters
  • CQL - searches for one or more word forms according to the given expression in the CQL query language. While entering morphological tags for Czech the user might find useful the helper option insert tag, which allows to enter codes at the appropriate position of the tag using a menu of attributes and their corresponding values. All languages include the insert "within" option, which helps to filter the query results according to metadata, ie bibliographic and other data relating to the texts. For a list of attributes and their values, see here. The 'attribute="value"' pairs can be combined using the operator & (logical conjunction). The whole "within" condition must be placed at the end of a query, following expressions specifying one or more positions (in brackets). A single query can include multiple "within" conditions. The following two example queries produce identical results, namely sentences including nouns in the vocative case in original Czech dramas:
    [tag="N...5.*"] within <div txtype="drama" & srclang="cs" />
    [tag="N...5.*"] within <div txtype="drama" /> within <div srclang="cs" />

Click on Make Concordance to evaluate the query.

Query results

The response to the query is shown as a list of concordances. The menu, located in the leftmost column of the page, can be used to modify the view, to store, sort or filter the concordances, to search for collocations and to compute statistics, based on the result. The menu options operate on the language in focus, highlighted by the light blue background colour; the parameters of the relevant part of the corpus are shown in the page heading. The language in focus can be changed by clicking on the column heading, specifying the required corpus part. To move the menu above the concordances click on Switch menu position. To enter a new query click on Concordance. To see results hidden behind the right edge of the window extend the window or use the horizontal scroll bar, which can be found after rolling down to the bottom of the page of concordance.

Subcorpora - restricting the range of searched texts

By default, searching is performed on all texts in the selected languages. The range of texts can be restricted before evaluating the query by clicking on the Subcorpus item in the menu. The restrictions apply to the primary language. To select which texts should be included tick appropriate values of the relevant attributes in the Subkorpus frame. If no value for a given attribute is selected, the search range is not restricted by that attribute. This is equivalent to selecting all values by clicking the Select All button. However, selecting all or many titles by ticking the values of the div.id attribute results in an error message The selection condition you have specified is too long. Please create a subcorpus instead.

To choose texts for a subcorpus click in the Subcorpus frame on create new and specify the relevant attributes in the new window. In addition to ticking the appropriate attribute values the subcorpus can be specified in more detail by using the Custom 'within' condition. In most cases the condition should be specified for attributes on the level of the document part (div). For a list of attributes and their values see here. If you wish to create a subcorpus of original Czech drama plays, select div from the menu and enter the following in the box for specifying the condition:

txtype="drama" & srclang="cs"

The same result would be achieved by checking the boxes drama and cs in the corresponding lists of attributes. To create the subcorpus, name it in the box New subcorpus name and click on Create Subcorpus. While entering a query the subcorpus can be invoked by choosing one of the Available subcorpora in the Subcorpus frame.

Last update: 31 May 2013