AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:obc:spell3 [2020/02/14 17:37] jankoceken:obc:spell3 [2020/02/27 12:21] (current) jankocek
Line 1: Line 1:
 ===== Lesson 4: Spelling III (Searching with tags) ====== ===== Lesson 4: Spelling III (Searching with tags) ======
-**Old Bailey Corpus** 
- 
-**Lesson 4: Spelling III, searching with tags** 
  
 The OBC was part-of-speech tagged using the [[http://ucrel.lancs.ac.uk/claws7tags.html|CLAWS 7]] tagset. Each word, which is here defined as an uninterrupted string of characters, excluding apostrophes and hyphens, delimited by punctuation or white space, is assigned a tag which specifies the part of speech identified in the given context. This is an automatic process; you may encounter some inaccuracies but the number of them should be fairly minimal. For more information, see the [[https://fedora.clarin-d.uni-saarland.de/oldbailey/downloads/OBC_2.0_Manual%202016-07-13.pdf|OBC Manual]]. The OBC was part-of-speech tagged using the [[http://ucrel.lancs.ac.uk/claws7tags.html|CLAWS 7]] tagset. Each word, which is here defined as an uninterrupted string of characters, excluding apostrophes and hyphens, delimited by punctuation or white space, is assigned a tag which specifies the part of speech identified in the given context. This is an automatic process; you may encounter some inaccuracies but the number of them should be fairly minimal. For more information, see the [[https://fedora.clarin-d.uni-saarland.de/oldbailey/downloads/OBC_2.0_Manual%202016-07-13.pdf|OBC Manual]].
  
-It is important to keep in mind that synthetic genitives such as //mother’s// or contracted forms like //don’t// are counted as two words, since the CLAWS system transforms these into separate items. For //mother’s//, the query then must be written as follows: [word="mother"] [word="'s"]+It is important to keep in mind that synthetic genitives such as //mother’s// or contracted forms like //don’t// are counted as two words, since the CLAWS system transforms these into separate items. (For //mother’s//, the query then must be written as follows: ''[word="mother"] [word="'s"]'')
  
-These items then have their own special tag, //GE// and //XX// respectively. However, past and past participle forms involving apostrophes such as //cry’d// are counted as one word.+These items then have their own special tag, GE and XX respectively. However, past and past participle forms involving apostrophes such as //cry’d// are counted as one word.
  
-<html><u></html>Searching with tags<html></u></html>+**Searching with tags**
  
 In the previous lessons, you have worked with individual specific words. Searching using tags allows you to look at the given phenomena as it occurs across whole classes of words. In the previous lessons, you have worked with individual specific words. Searching using tags allows you to look at the given phenomena as it occurs across whole classes of words.
  
-Let’s take a look at the contracted past tense and participle ending //‘d//. By examining the tagset, we can see that it distinguishes all different types of verbs; modal auxiliaries are tagged as VM, infinitive forms as VVI and so on. In the standard tagest, there is no specific tag for verbs in the forms we are looking for. The easiest way to find the past and past participle contracted forms would then be to search for all verbs that end in //‘d//. To do so, use [[https://wiki.korpus.cz/doku.php/en:pojmy:regularni_vyrazy|regular expressions]]; the sequence //.*// in particular. The full stop //.// represents any one character, while the asterisk //*// matches zero, one or more repetitions of the previous character. The sequence then represents any part of a word or a tag. We can see in the tagset that all verb tags begin with V and we can substitute the rest of the tag with the regular expression //.*// to match any verbal tag.+Let’s take a look at the contracted past tense and participle ending //‘d//. By examining the tagset, we can see that it distinguishes all different types of verbs; modal auxiliaries are tagged as VM, infinitive forms as VVI and so on. In the standard tagest, there is no specific tag for verbs in the forms we are looking for. The easiest way to find the past and past participle contracted forms would then be to search for all verbs that end in //‘d//. To do so, use [[https://wiki.korpus.cz/doku.php/en:pojmy:regularni_vyrazy|regular expressions]]; the sequence ''.*'' in particular. The full stop ''.'' represents any one character, while the asterisk ''*'' matches zero, one or more repetitions of the previous character. The sequence then represents any part of a word or a tag. We can see in the tagset that all verb tags begin with V and we can substitute the rest of the tag with the regular expression ''.*'' to match any verbal tag.
  
-Make sure the query type is set on CQL, you may also set the default attribute below the search window to //tag//, however it is not necessary. If you do so, the square brackets and the specified attribute can be left out in the query (i.e. you can type only "V.*" into the search box). You may start the query as such:+Make sure the query type is set on CQL, you may also set the default attribute below the search window to //tag//, however it is not necessary. If you do so, the square brackets and the specified attribute can be left out in the query (i.e. you can type only ''"V.*"'' into the search box). You may start the query as such:
  
-[tag="V.*"]+''[tag="V.*"]''
  
-This query alone would find all verbs in the corpus, but what we need is to limit the search to only the verbs which end with //‘d//. For this, you can make use of the ampersand symbol (//&//) which represents the function of AND. When you connect two or more attributes with //&//, the resultant concordance will include only those occurrences which fulfil all the conditions specified in the query. The second part of the query is the //word// attribute; to look for any word which ends with //‘d//, we can use another regular expression. This time, we want to use the symbol + instead of *, since + represents one or more repetitions of the previous character; this way we avoid the possibility of only //‘d// appearing in the concordance.+This query alone would find all verbs in the corpus, but what we need is to limit the search to only the verbs which end with //‘d//. For this, you can make use of the ampersand symbol (''&'') which represents the function of AND. When you connect two or more attributes with ''&'', the resultant concordance will include only those occurrences which fulfil all the conditions specified in the query. The second part of the query is the //word// attribute; to look for any word which ends with //‘d//, we can use another regular expression. This time, we want to use the symbol ''+'' instead of ''*'', since ''+'' represents one or more repetitions of the previous character; this way we avoid the possibility of only //‘d// appearing in the concordance.
  
-[tag="V.*" & word=".+'d"]+''[tag="V.*" & word=".+'d"]''
  
-With this query, we are searching for all words which are tagged as verbs and which at the same time end with //‘d//. The number of hits is 51705 and the relative frequency is 1,459.09.+With this query, we are searching for all words which are tagged as verbs and which at the same time end with //‘d//. The number of hits is 51,705 and the relative frequency is 1,459.09.
  
 To view the tags of any of the words included in the concordance, hover over the individual words or elements. To view the tags of any of the words included in the concordance, hover over the individual words or elements.
  
-{{Obrázek_1.png|fig:Obrázek_1.png}}Here, the word //something// is tagged as PN1, which corresponds to //indefinite pronoun, singular// in the tagset.+{{:en:obc:l4_1.png?direct&600|}}
  
-You may change this setting by clicking on View → Corpus-specific settings and selecting a different option listed under <html><u></html>How to display additional positional attributes?<html></u></html>.+Here, the word //something// is tagged as PN1, which corresponds to //indefinite pronoun, singular// in the tagset. 
 + 
 +You may change this setting by clicking on //View → Corpus-specific settings// and selecting a different option listed under //How to display additional positional attributes?//.
  
 You may have noticed that the forms you searched for (KWIC) are tagged as VVX. This tag is not a part of the standard CLAWS 7 tagset but it was added during the tagging process specifically to the OBC. To read more about the corrections, see the [[https://fedora.clarin-d.uni-saarland.de/oldbailey/downloads/OBC_2.0_Manual%202016-07-13.pdf|OBC Manual]], page 12. Hence, it is recommendable not to always rely on the tagest only, but rather to check the actual tagging in the given string and build your query according to that. You may have noticed that the forms you searched for (KWIC) are tagged as VVX. This tag is not a part of the standard CLAWS 7 tagset but it was added during the tagging process specifically to the OBC. To read more about the corrections, see the [[https://fedora.clarin-d.uni-saarland.de/oldbailey/downloads/OBC_2.0_Manual%202016-07-13.pdf|OBC Manual]], page 12. Hence, it is recommendable not to always rely on the tagest only, but rather to check the actual tagging in the given string and build your query according to that.
  
-Let’s check the frequency list (Frequency → Node forms [A=a]) to see which verbs are most commonly contracted in this way.+Let’s check the frequency list (//Frequency → Node forms [A=a]//) to see which verbs are most commonly contracted in this way. 
 + 
 +{{:en:obc:l4_2.png?direct&300|}} 
 + 
 +To compare the frequency of the contracted forms with the full forms, let’s do a quick search for the full forms of the top four most frequent contracted verbs: 
 + 
 +''[word="deposed" | word="asked" | word="called" | word="robbed"]'' 
 + 
 +Go to the frequency list (//Frequency → Node forms [A=a]//) and compare:
  
-{{Obrázek_3.png|fig:Obrázek_3.png}}To compare the frequency of the contracted forms with the full forms, let’s do a quick search for the full forms of the top four most frequent contracted verbs:+{{:en:obc:l4_3.png?direct&300|}}
  
-[word="deposed" | word="asked" | word="called" | word="robbed"]+<WRAP round help 40%> 
 +**Task:**
  
-{{Obrázek_2.png|fig:Obrázek_2.png}}Go to the frequency list (Frequency → Node forms [A=a]) and compare:+ Try to find all plural nouns in the genitive case which are formed with the //‘s// suffix 
 +    * Keep in mind the different tags for different classes of nouns 
 +    * Make sure the query type is set to CQL 
 +    * Notice the spelling conventions – can you find an example in which the genitive //‘s// follows the plural //-s//? How frequent is it? 
 +</WRAP> 
  
-Task:* Try to find all plural nouns in the genitive case which are formed with the //‘s// suffix+You can find the solution [[en:obc:solution|here]].
  
-<HTML><ul></HTML> 
-<HTML><li></HTML><HTML><ul></HTML> 
-<HTML><li></HTML>Keep in mind the different tags for different classes of nouns<HTML></li></HTML> 
-<HTML><li></HTML>Make sure the query type is set to CQL<HTML></li></HTML> 
-<HTML><li></HTML>Notice the spelling conventions – can you find an example in which the genitive //‘s// follows the plural //-s//? How frequent is it?<HTML></li></HTML><HTML></ul></HTML> 
-<HTML></li></HTML><HTML></ul></HTML> 
  
-[[https://kontext.korpus.cz/view?q=~6AljBlGNoxr6|Solution]]:* Query: [tag="N.*2"][tag="GE"]+----
  
-  //prisoners’s// 14x''prosecutors’s ''10x+**If you are readyyou can continue to [[en:obc:intro_to_metadata|Lesson 5]].**
  
 +----