AplikaceAplikace
Nastavení

This is an old revision of the document!


Lesson 1: Introduction

This page provides a basic overview of how to use the EEBO corpus and how to input the data into the search interface using different query types.

Corpus selection

After successfully completing the online registration and logging into KonText, we can begin with our very first query in the EEBO corpus. First of all, we need to select the corpus we intend to work with. The default corpus of KonText is the syn2015 corpus. By clicking on the icon syn2015, a menu appears with all of the available corpora. If we have worked with KonText before, we might also see the list of my favorite corpora located on the left side of the menu. The list of featured corpora is located on the right side. If the EEBO corpus is not included in either of the lists, we click on the icon all corpora and a search box will appear where we can type in a part of the name or description of the corpus we plan to work with. We type in “EEBO” and select the corpus in the dropdown menu. By clicking on the star next to the icon with the selected corpus, we can add the EEBO corpus to the list of our favorite corpora. Next time we work with KonText, the EEBO corpus will be included in the list of my favorite corpora. Now we can type the query in the Query box.

Corpus selection

First query

Now we can type any word or combination of words into the query line of KonText interface and observe how often the wanted phenomenon occurs. Just click on the Search button or press the Enter key.

Form for creating a query

We can try to find in the EEBO corpus

  1. names of English monarchs ruling in the period 1400-1700
  2. punctuation marks such as a question mark – ? (for interrogative sentences) or an exclamation mark – !
  3. some words that has changed their meanings since the Early Modern English period
    • silly which once meant worthy or blessed
    • myriad which once referred to a specific number, i.e. 10,000
    • meat which originally had more general meaning, i.e food

You can check if your query search worked correctly (corpus: EEBO, query type: basic):

Query Number of hits Relative frequency (i.p.m.)
Charles I 816 0.94
? 1,800,519 2,064.02
silly 7,113 8.15
myriad 42 0.05
meat 41,990 48.14

It should be noted that the EEBO corpus contains approximately 730 million words and therefore the word myriad (with 42 occurrences) has a relative frequency of 0.05 instances per million (i.p.m.). Relative frequency is essential when working with corpora of different sizes as 10 hits in the corpus containing 100 million words does not equal the frequency of 10 hits in the corpus containing twice as many words.

The searched word or phrase which is pink-coloured in our interface is called KWIC (key word in context). The whole line is called concordance line and is part of the concordance (the list of all concordance lines, i.e. all occurrences of the searched words as well as their contexts).

Concordance list for silly

New Query

If we wish to begin a new search in KonText, we click on the item Query → New Query located in the top menu.

Useful tip: The easiest way how to create a new query is to click directly on the icon in the upper-left corner.

Query types

There are 6 different query types in the KonText interface (basic, lemma, phrase, node form, character, CQL). Each of them is suitable for different kinds of research. As the EEBO corpus is not lemmatized, it is not possible to select lemma as the query type.

Query type: word Form

Word Form is one of the most user-friendly query types. With Word Form we can search in the corpus for the specific form of the query. If we type apple into the query line, only those occurrences of the word will appear that exactly match the query. Therefore, Apple with upper-case A will not be included in the generated results.

The only difference between the query and the result could be letter case. The default setting for Word Form is case-sensitive which means that the results will include both lower and upper case forms, i.e. (the query results of god will include both god,God but also GOD.

In order for the query to be case-sensitive we need to tick the box Match case located beneath the query line. If we enter James with upper-case J, the concordance list will include only the exact match James, excluding james.

Query type: Basic

Basic query is ideal for elementary searches which do not require a very high degree of accuracy (in many respects this query type is equivalent to the basic search engines such as google). In the case of a dictionary form (lemma), all of its possible forms are searched for. As the EEBO corpus is not lemmatized, this option is not possible. Therefore, only those forms appear in the results that absolutely match the query.

Query type: phrase

This query type is used especially for finding multiword expressions, as the query types word form and lemma do not allow for searching for more than one word. With the phrase query type we can only find the exact wording of a phrase. In this respect it is similar to the basic query.

Let's try searching for some phrases in the EEBO corpus:

  • almighty god
  • good god
Query almighty god good god
case sensitive (all of the results will be in lower case) only almighty god appears in the results only good god appears in the results
case insensitive Almighty God, almighty God, Almighty GOD, almighty god etc. good God, Good God, good god, good GOD etc.

In CQL syntax the equivalent of this query would be: [word="almighty"][word="god"].

Query type: Character

If we wish to find all of the words that contain a string of consecutive characters (e.g. a root), then this query type is the most suitable for this kind of searches. With character query type we can find all of the words that contain those characters and are preceded or followed by any number of characters (or none).

For example if we type wise into the query line, the following words will appear in the results:

  • likewise
  • wise
  • otherwise
  • wisedome
  • wisely
  • wiser
  • twise

In CQL syntax we would write the query as follows: [word=".*wise.*"]

Task

  • Find all of the words that contain the following roots:
  • like
  • blood
  • craft

After the concordance list appears, click on the Frequency button and select Node forms in the dropdown menu. All of the words containing this string of characters will appear arranged according to the number of hits in the corpus.

Frequency list of wise

Query type: CQL

Corpus query language or CQL is the most universal query type that we can use when searching the EEBO corpus. All of the aforementioned query types are converted into CQL in the KonText interface. How to use CQL will be explained in more advanced lessons of this tutorial.

Basic information about the EEBO corpus

Basic information about the EEBO corpus

If we wish to find out basic information about the corpus we are using (e.g. EEBO), we can click on the name of the corpus located beneath the KonText icon. A window containing basic information about EEBO will be displayed after clicking on the EEBO button. We can learn about the size of the corpus or find out which metadata is available for this corpus.