This is an old revision of the document!
Table of Contents
Lesson 1: Introduction
This page provides a basic overview of how to use the EEBO corpus and how to input the data into the search interface using different query types.
Corpus selection
After successfully completing the online registration and logging into KonText, we can begin with our very first query in the EEBO corpus. First of all, we need to select the corpus we intend to work with. The default corpus of KonText is the syn2015 corpus. By clicking on the icon syn2015, a menu appears with all of the available corpora. If we have worked with KonText before, we might also see the list of my favorite corpora located on the left side of the menu. The list of featured corpora is located on the right side. If the EEBO corpus is not included in either of the lists, we click on the icon all corpora and a search box will appear where we can type in a part of the name or description of the corpus we plan to work with. We type in “EEBO” and select the corpus in the dropdown menu. By clicking on the star next to the icon with the selected corpus, we can add the EEBO corpus to the list of our favorite corpora. Next time we work with KonText, the EEBO corpus will be included in the list of my favorite corpora. Now we can type the query in the Query box.
First query
Now we can type any word or combination of words into the query line of KonText interface and observe how often the wanted phenomenon occurs. Just click on the Search button or press the Enter key.
We can try to find in the EEBO corpus
- names of English monarchs ruling in the period 1400-1700
- punctuation marks such as a question mark – ? (for interrogative sentences) or an exclamation mark – !
- some words that has changed their meanings since the Early Modern English period
- silly which once meant worthy or blessed
- myriad which once referred to a specific number, i.e. 10,000
- meat which originally had more general meaning, i.e food
We can now check if the query search worked correctly (corpus: EEBO, query type: basic):
Query | Number of hits | Relative frequency (i.p.m.) |
---|---|---|
Charles I | 816 | 0.94 |
? | 1,800,519 | 2,064.02 |
silly | 7,113 | 8.15 |
myriad | 42 | 0.05 |
meat | 41,990 | 48.14 |
It should be noted that the EEBO corpus contains approximately 872 million words and therefore the word myriad (with 42 occurrences) has the relative frequency of 0.05 instances per million (i.p.m.). Relative frequency is essential when working with the corpora of different sizes as 10 hits in the corpus containing 100 million words does not equal the frequency of 10 hits in the corpus containing twice as many words.
The searched word or phrase which is pink-coloured in the concordance list is called KWIC (key word in context). The whole line is called concordance line and is part of the concordance (the list of all concordance lines, i.e. all occurrences of the searched words as well as their contexts).
New Query
If we wish to begin a new search in KonText, we click on the item Query → New Query located in the top menu.
Query types
There are 6 different query types in the KonText interface (basic, lemma, phrase, node form, character, CQL). Each of them is suitable for different kind of research. As the EEBO corpus is not lemmatized, it is not possible to select lemma as the query type.
Query type: Word Form
Word Form is one of the most user-friendly query types. With Word Form we can search in the corpus for the specific form of the query. If we type apple into the query line, only those occurrences of the word will be in the results that exactly match the query. Therefore, Apple with upper-case A will not be included in the generated results.
The only difference between the query and the result could be letter case. The default setting for Word Form is case-insensitive which means that the results will include both the lower and upper case forms, i.e. (the query results of god will include both god,God but also GOD.
In order for the query to be case-sensitive we need to tick the box Match case located beneath the query line. If we enter James with upper-case J
, the concordance list will include only an exact match, i.e. James, excluding james.
Query type: Basic
Basic query is ideal for elementary searches which do not require a very high degree of accuracy (in many respects this query type is equivalent to the basic search engines such as google). In the case of a dictionary form (lemma), all of its possible forms are searched for, such as slept, sleeping and sleeps for the lemma sleep. As the EEBO corpus is not lemmatized, this option is not possible. Therefore, only those forms appear in the results that absolutely match the query.
Query type: Phrase
This query type is used especially for finding multiword expressions, as the query types word form and lemma do not allow for searching for more than one word. With the phrase query type we can only find the exact wording of a phrase. In this respect it is similar to the basic query.
Let's try searching for some phrases in the EEBO corpus:
- almighty god
- good god
Query | almighty god | good god |
---|---|---|
case sensitive (all of the results will be in lower case) | only almighty god appears in the results | only good god appears in the results |
case insensitive | Almighty God, almighty God, Almighty GOD, almighty god etc. | good God, Good God, good god, good GOD etc. |
In CQL syntax, the equivalent of this query would be: [word="almighty"][word="god"]
.
Query type: Character
If we wish to find all of the words that contain a string of consecutive characters (e.g. the root of the word), then this query type is the most suitable for this kind of searches. With character query type we can find all of the words that contain the wanted characters and are preceded or followed by any number of characters (or none).
For example if we type wise into the query line, the following words will appear in the results:
- likewise
- wise
- otherwise
- wisedome
- wisely
- wiser
- twise
In CQL syntax we would write the query as follows: [word=".*wise.*"]
Task
- Find all of the words that contain the following roots:
- like
- blood
- craft
After the concordance list appears, click on the Frequency button and select Node forms in the dropdown menu. All of the words containing this string of characters will appear arranged according to the number of hits in the corpus.
Query type: CQL
Corpus query language or CQL is the most universal query type that we can use when searching the EEBO corpus. All of the aforementioned query types can be converted into CQL in the KonText interface. How to use CQL will be explained in more advanced lessons of this tutorial (Lesson 2).
Basic information about the EEBO corpus
If we wish to find out basic information about the corpus we are using (e.g. EEBO), we can click on the name of the corpus located beneath the KonText icon . A window containing basic information about EEBO will be displayed after clicking on the EEBO button. We can learn about the size of the corpus or find out which metadata is available for this corpus.
If you are ready, you can continue to Lesson 2.
Menu: New Query • Subcorpus • Save • concordance • Filter • Frequency • Collocation • View • Help