Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:eebo:first_query [2016/11/10 20:21] – [Corpus selection] kristinavalentinyova | en:eebo:first_query [2018/07/30 14:40] (current) – vaclavcvrcek |
---|
======== Lesson 1: Introduction ======== | ======== Lesson 1: Introduction ======== |
| |
This page provides a basic overview of how to use the EEBO corpus and how to input the data into the search interface using different query types. | This page provides a basic overview of how to use the [[en:cnk:eebo|EEBO]] corpus and how to input the data into the search interface using different query types. |
| |
====== Corpus selection ====== | ====== Corpus selection ====== |
| |
After successfully completing the [[https://www.korpus.cz/toolbar/signup.php|online registration]] and logging into KonText, we can begin with our very first query in the EEBO corpus. First of all, we need to select the corpus we intend to work with. The default corpus of KonText is the //syn2015// corpus. By clicking on the icon //syn2015//, a menu appears with all of the available corpora. If we have worked with KonText before, we might also see the list of your favorite corpora located on the left side of the menu. The list of featured corpora is located on the right side. If the EEBO corpus is not included in either of the lists, we click on the icon //all corpora// and a search box will appear where we can type in a part of the name or description of the corpus we plan to work with. We type in "EEBO" and select the corpus in the dropdown menu. By clicking on the star next to the icon with the selected corpus, we can add the EEBO corpus to the list of our favorite corpora. Next time we work with KonText, the EEBO corpus will be included in the list of //my favorite corpora//. Now we can type the query in the **Query** box. | After successfully completing the [[https://www.korpus.cz/toolbar/signup.php|online registration]] and logging into [[en:manualy:kontext:index|KonText]], we can begin with our very first query in the EEBO corpus. First of all, we need to select the corpus we intend to work with. The default corpus of KonText is the //syn2015// corpus. By clicking on the icon //syn2015//, a menu appears with all of the available corpora. If we have worked with KonText before, we might also see the list of //my favorite corpora// located on the left side of the menu. The list of //featured corpora// is located on the right side. If the EEBO corpus is not included in either of the lists, we click on the icon //all corpora// and a search box will appear where we can type in a part of the name or description of the corpus we plan to work with. We type in "EEBO" and select the corpus in the dropdown menu. By clicking on the star next to the icon with the selected corpus, we can add the EEBO corpus to the list of our favorite corpora. Next time we work with KonText, the EEBO corpus will be included in the list of //my favorite corpora//. Now we can type the query in the **Query** box. |
| |
| |
===== First query ===== | ===== First query ===== |
| |
You can type any word or combination of words into the query line of [[en:manualy:kontext:index|KonText interface]] and observe how often the desired phenomenon occurs. Just click on the **Search** button or press the **Enter** key. | Now we can type any word or combination of words into the query line of [[en:manualy:kontext:index|KonText interface]] and observe how often the wanted phenomenon occurs. Just click on the **Search** button or press the **Enter** key. |
| |
[{{eebo-2.png?500|Form for creating a query}}] | [{{eebo-2.png?500|Form for creating a query}}] |
<WRAP round help 40%> | <WRAP round help 40%> |
| |
**You can try to find in the EEBO corpus** | **We can try to find in the EEBO corpus** |
- names of English monarchs ruling in the period 1400-1700 | - names of English monarchs ruling in the period 1400-1700 |
- punctuation marks such as a question mark -- //?// (for interrogative sentences) or an exclamation mark -- //!// | - punctuation marks such as a question mark -- //?// (for interrogative sentences) or an exclamation mark -- //!// |
- a word that has since changed its meaning | - some words that has changed their meanings since the Early Modern English period |
* //silly// which once meant //worthy// or //blessed// | * //silly// which once meant //worthy// or //blessed// |
* //myriad// which once referred to a specific number, i.e. 10,000 | * //myriad// which once referred to a specific number, i.e. 10,000 |
<WRAP clear/> | <WRAP clear/> |
| |
You can check if your query search worked correctly (corpus: EEBO, query type: basic): | We can now check if the query search worked correctly (corpus: EEBO, query type: basic): |
| |
| |
| '' meat'' | 41,990 | 48.14 | | | '' meat'' | 41,990 | 48.14 | |
| |
It should be noted that the EEBO corpus contains approximately 730 million words and therefore the word //myriad// (with 42 occurrences) has a relative [[en:pojmy:frekvence|frequency]] of 0.05 instances per million (i.p.m.). Relative frequency is essential when working with corpora of different sizes as 10 hits in the corpus containing 100 million words does not equal the frequency of 10 hits in the corpus containing twice as many words. | It should be noted that the EEBO corpus contains approximately 872 million words and therefore the word //myriad// (with 42 occurrences) has the relative [[en:pojmy:frekvence|frequency]] of 0.05 instances per million (i.p.m.). Relative frequency is essential when working with the corpora of different sizes as 10 hits in the corpus containing 100 million words does not equal the frequency of 10 hits in the corpus containing twice as many words. |
| |
The searched word or phrase which is pink-coloured in our interface is called [[en:pojmy:kwic|KWIC]] (key word in context). The whole line is called concordance line and is part of the [[en:pojmy:konkordance|concordance]] (the list of all concordance lines, i.e. all occurrences of the searched words as well as their contexts). | The searched word or phrase which is pink-coloured in the concordance list is called [[en:pojmy:kwic|KWIC]] (key word in context). The whole line is called concordance line and is part of the [[en:pojmy:konkordance|concordance]] (the list of all concordance lines, i.e. all occurrences of the searched words as well as their contexts). |
[{{eebo-3.png?500|Concordance list for //silly//}}] | [{{eebo-3.png?500|Concordance list for //silly//}}] |
| |
====== Query types ====== | ====== Query types ====== |
| |
There are 6 different query types in the KonText interface (//basic, lemma, phrase, node form, character, CQL//). Each of them is suitable for different kinds of research. As the EEBO corpus is not lemmatized, it is not possible to select //lemma// as the query type. | There are 6 different query types in the KonText interface (//basic, lemma, phrase, node form, character, CQL//). Each of them is suitable for different kind of research. As the EEBO corpus is not lemmatized, it is not possible to select //lemma// as the query type. |
| |
| |
===== Query type: word Form ===== | ===== Query type: Word Form ===== |
| |
[[en:pojmy:word|Word Form]] is one of the most user-friendly query types. With Word Form we can search in the corpus for the specific form of the query. If we type //apple// into the query line, only those occurrences of the word will appear that exactly match the query. Therefore, //Apple// with upper-case //A// will not be included in the generated results. | [[en:pojmy:word|Word Form]] is one of the most user-friendly query types. With Word Form we can search in the corpus for the specific form of the query. If we type //apple// into the query line, only those occurrences of the word will be in the results that exactly match the query. Therefore, //Apple// with upper-case //A// will not be included in the generated results. |
| |
The only difference between the query and the result could be letter case. The default setting for Word Form is [[wp>Case_sensitivity|case-sensitive]] which means that the results will include both lower and upper case forms, i.e. (the query results of //god// will include both //god//,//God// but also //GOD//. | The only difference between the query and the result could be letter case. The default setting for Word Form is [[wp>Case_sensitivity|case-insensitive]] which means that the results will include both the lower and upper case forms, i.e. (the query results of //god// will include both //god//,//God// but also //GOD//. |
| |
In order for the query to be [[wp>Case_sensitivity|case-sensitive]] we need to tick the box **Match case** located beneath the query line. If we enter //James// with upper-case ''J'', the concordance list will include only the exact match //James//, excluding //james//. | In order for the query to be [[wp>Case_sensitivity|case-sensitive]] we need to tick the box **Match case** located beneath the query line. If we enter //James// with upper-case ''J'', the concordance list will include only an exact match, i.e. //James//, excluding //james//. |
| |
| |
| |
| |
Basic query is ideal for elementary searches which do not require a very high degree of accuracy (in many respects this query type is equivalent to the basic search engines such as google). In the case of a dictionary form (lemma), all of its possible forms are searched for. As the EEBO corpus is not lemmatized, this option is not possible. Therefore, only those forms appear in the results that absolutely match the query. | Basic query is ideal for elementary searches which do not require a very high degree of accuracy (in many respects this query type is equivalent to the basic search engines such as //google//). In the case of a dictionary form ([[en:pojmy:lemma|lemma]]), all of its possible forms are searched for, such as //slept//, //sleeping// and //sleeps// for the lemma //sleep//. As the EEBO corpus is not lemmatized, this option is not possible. Therefore, only those forms appear in the results that absolutely match the query. |
| |
| |
| |
===== Query type: phrase ===== | ===== Query type: Phrase ===== |
| |
| |
| |
| |
<wrap lo> In [[en:pojmy:dotazovaci_jazyk|CQL]] syntax the equivalent of this query would be: ''[word=<nowiki>"</nowiki>almighty<nowiki>"</nowiki>][word=<nowiki>"</nowiki>god<nowiki>"</nowiki>]''.</wrap> | <wrap lo> In [[en:pojmy:dotazovaci_jazyk|CQL]] syntax, the equivalent of this query would be: ''[word=<nowiki>"</nowiki>almighty<nowiki>"</nowiki>][word=<nowiki>"</nowiki>god<nowiki>"</nowiki>]''.</wrap> |
| |
===== Query type: Character ===== | ===== Query type: Character ===== |
| |
If we wish to find all of the words that contain a string of consecutive characters (e.g. a root), then this query type is the most suitable for this kind of searches. With character query type we can find all of the words that contain those characters and are preceded or followed by any number of characters (or none). | If we wish to find all of the words that contain a string of consecutive characters (e.g. the root of the word), then this query type is the most suitable for this kind of searches. With character query type we can find all of the words that contain the wanted characters and are preceded or followed by any number of characters (or none). |
| |
| |
===== Query type: CQL ===== | ===== Query type: CQL ===== |
| |
Corpus query language or [[en:pojmy:cql|CQL]] is the most universal query type that we can use when searching the EEBO corpus. All of the aforementioned query types are converted into CQL in the KonText interface. How to use CQL will be explained in more advanced lessons of this tutorial. | Corpus query language or [[en:pojmy:cql|CQL]] is the most universal query type that we can use when searching the EEBO corpus. All of the aforementioned query types can be converted into CQL in the KonText interface. How to use CQL will be explained in more advanced lessons of this tutorial ([[en:eebo:orthography_spelling|Lesson 2]]). |
| |
===== Basic information about the EEBO corpus ===== | ===== Basic information about the EEBO corpus ===== |
| |
If we wish to find out basic information about the corpus we are using (e.g. EEBO), | If we wish to find out basic information about the corpus we are using (e.g. EEBO), |
we can click on the name of the corpus located beneath the KonText icon. | we can click on the name of the corpus located beneath the KonText icon {{kurz:logo_kontext_male.png}}. |
A window containing basic information about EEBO will be displayed | A window containing basic information about EEBO will be displayed |
after clicking on the EEBO button. We can learn about the size of the corpus | after clicking on the EEBO button. We can learn about the size of the corpus |
| |
| |
| ---- |
| |
| **If you are ready, you can continue to [[en:eebo:orthography_spelling|Lesson 2]].** |
| |
---- | ---- |