Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:eebo:collocations [2016/09/22 07:16] – [Finding collocations of a word] kristinavalentinyova | en:eebo:collocations [2018/07/30 14:49] (current) – vaclavcvrcek | ||
---|---|---|---|
Line 1: | Line 1: | ||
======== Lesson 5: Collocations ======== | ======== Lesson 5: Collocations ======== | ||
+ | |||
In this section we will have a look at [[en: | In this section we will have a look at [[en: | ||
In any language, every word has a tendency to cooccur with certain words more often than with the others. John R. Firth, an English linguist, famously said that "You shall know a word by the company it keeps" (Firth, J. R. 1957:11). What he meant was that the context in which the words occur can tell us more about them and the words themselves. | In any language, every word has a tendency to cooccur with certain words more often than with the others. John R. Firth, an English linguist, famously said that "You shall know a word by the company it keeps" (Firth, J. R. 1957:11). What he meant was that the context in which the words occur can tell us more about them and the words themselves. | ||
- | If we think of the word //tea// the following phrases will come to mind: black tea, strong tea, cup of tea or even coffee. These are all collocations of the word //tea//. Using the corpus, we can statistically prove that some words collocate with the word //tea// while the others do not. | + | If we think of the word //tea// the following phrases will come to mind: black tea, strong tea, cup of tea or tea leaves. These are all collocations of the word //tea//. Using the corpus, we can statistically prove that some words are more likely to collocate with the word //tea// while the others do not sound very native-like such as //powerful tea//, even though // |
======= Finding collocations of a word ======= | ======= Finding collocations of a word ======= | ||
Thanks to the corpus linguistics, | Thanks to the corpus linguistics, | ||
- | Let's try finding collocations of a word such as //bread//. We select EEBO as the corpus we wish to work with and then use basic query type. After clicking on the search button, concordance lines will appear. //Bread//, a key word, is always located in the middle of the line (higlighted in pink). We then click on the **collocations** button located in the upper menu and select **custom** from the dropdown menu. | + | |
+ | Let's try finding collocations of a word such as //bread//. We select | ||
[{{eebo-9.png? | [{{eebo-9.png? | ||
- | First of all we need to decide in which range will we be searching for collocations. The default range **from -3 to 3** from KWIC is suitable for the majority of searches. If we wish to find out what kind of adjectives collocate with the word //bread// we need to set the range **from -1 to -1**. What this setting does is that it restricts the span to the first position to the left from the key word. Therefore we should be able to determine which adjectives frequently modifed the word //bread// in the period that EEBO covers. | + | First of all, we need to decide in which range we will be searching for collocations. The default range **from -3 to 3** from KWIC is suitable for the majority of searches. If we wish to find out what kind of adjectives collocate with the word //bread// we need to set the range **from -1 to -1**. What this setting does is that it restricts the span to the first position to the left from the key word. Therefore we should be able to determine which adjectives frequently modifed the word //bread// in the period that EEBO covers. |
- | Under the heading **show functions** we can choose which measures of association we wish to be calculated for bread. [[en: | + | Under the heading **show functions:** we can choose which measures of association we wish to be calculated for //bread//. [[en: |
+ | |||
+ | For example, we can select the following measures: | ||
+ | |||
+ | |||
+ | If you sorted the list according to the log-likelihood, | ||
+ | |||
+ | |||
+ | |||
+ | [{{eebo-18.png? | ||
- | If you sorted the list according to the log-likelihood, | ||
- | - of | ||
- | - the | ||
- | - unleavened | ||
- | - daily | ||
- | - , (comma) | ||
<WRAP round tip 40%> | <WRAP round tip 40%> | ||
- | Don't worry about the grammatical words and punctuation marks in the first positions. Function words such as prepositions and articles are the most common words in any language and therefore they frequently co-occur with //bread//. As the EEBO corpus is not lemmatized, it is not possible to restrict the search to adjectives and nouns only. | + | Don't worry about the grammatical words and punctuation marks in the first positions. Function words such as prepositions and articles are the most common words in any language and therefore they frequently co-occur with any word, even //bread//. As the EEBO corpus is not lemmatized, it is not possible to restrict the search to adjectives and nouns only. |
</ | </ | ||
- | In the list of the first 50 collocation candidates, there are other words that frequently modify the word //bread// such as //this, childrens, grated, common, Sacramental, | + | In the list of the first 50 collocation candidates, there are other words that frequently modify the word //bread// such as //this, childrens, grated, common, Sacramental, |
<WRAP round help 40%> | <WRAP round help 40%> | ||
Let's try searching for collocates of the following words in the EEBO corpus: | Let's try searching for collocates of the following words in the EEBO corpus: | ||
* tea | * tea | ||
* war | * war | ||
- | You can also alter the range within which you wish to search for the collocates. | + | We can always modify |
</ | </ | ||
======= Association measures ======= | ======= Association measures ======= | ||
- | Association measures are used to identify a collocation. | + | Association measures are used to identify a collocation. |
^ Collocate ^ Frequency | ^ Collocate ^ Frequency | ||
Line 45: | Line 50: | ||
How can we interpret these results? | How can we interpret these results? | ||
- | * **MI** prefers | + | * **MI** prefers words with lower frequency and therefore the results |
* **T-score** is based on the co-occurrence frequency and therefore the results of T-score and frequency almost coincide. This association measure prefers words with a high frequency and therefore there are mostly grammatical words and punctuation marks in the first positions. Established collocations may be found in the lower positions of the list. | * **T-score** is based on the co-occurrence frequency and therefore the results of T-score and frequency almost coincide. This association measure prefers words with a high frequency and therefore there are mostly grammatical words and punctuation marks in the first positions. Established collocations may be found in the lower positions of the list. | ||
Line 53: | Line 58: | ||
* The negative numbers indicate the positions preceding the key word, while the positive ones refer to the right positions. | * The negative numbers indicate the positions preceding the key word, while the positive ones refer to the right positions. | ||
* Minimum frequency in corpus: establishes minimum overall frequency of a unit in order to be included in the collocate list | * Minimum frequency in corpus: establishes minimum overall frequency of a unit in order to be included in the collocate list | ||
- | * Minimum frequency in given range: provided that we specified the context span for collocate search from -3 to 3, then the minimum frequency in given range optiom | + | * Minimum frequency in given range: provided that we specified the context span for collocate search from -3 to 3, then the minimum frequency in given range option |
</ | </ | ||
+ | |||
+ | <WRAP round help 40%> | ||
+ | Look at the lists of words below. Using the EEBO corpus, find out which words collocate with the following three near synonyms: //godly, divine or sacred//? | ||
+ | </ | ||
+ | |||
+ | Each of the synonyms is used in slightly different contexts as can be inferred | ||
+ | * Set the range **from -3 to 3** | ||
+ | * Sort by **logDice** | ||
+ | |||
+ | ^Near synonyms ^ ^ | ||
+ | ^1st collocate |sorrow|Majesty|Nature| | ||
+ | ^ 2nd collocate |learned|Majeſty|Providence| | ||
+ | ^ 3rd collocate |man|Scriptures|Service| | ||
+ | ^ 4th collocate|Miniſters|Writ|Revelation| | ||
+ | ^5th collocate |men|Person|humane| | ||
+ | |||
+ | ---- | ||
+ | |||
+ | **If you are ready, you can continue to [[en: | ||
+ | |||
+ | ---- | ||