AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:obc:spell3 [2020/02/19 12:04] Michal Škrabalen:obc:spell3 [2020/02/27 12:21] (current) Jan Kocek
Line 11: Line 11:
 In the previous lessons, you have worked with individual specific words. Searching using tags allows you to look at the given phenomena as it occurs across whole classes of words. In the previous lessons, you have worked with individual specific words. Searching using tags allows you to look at the given phenomena as it occurs across whole classes of words.
  
-Let’s take a look at the contracted past tense and participle ending //‘d//. By examining the tagset, we can see that it distinguishes all different types of verbs; modal auxiliaries are tagged as VM, infinitive forms as VVI and so on. In the standard tagest, there is no specific tag for verbs in the forms we are looking for. The easiest way to find the past and past participle contracted forms would then be to search for all verbs that end in //‘d//. To do so, use [[https://wiki.korpus.cz/doku.php/en:pojmy:regularni_vyrazy|regular expressions]]; the sequence //.*// in particular. The full stop //.// represents any one character, while the asterisk //*// matches zero, one or more repetitions of the previous character. The sequence then represents any part of a word or a tag. We can see in the tagset that all verb tags begin with V and we can substitute the rest of the tag with the regular expression //.*// to match any verbal tag.+Let’s take a look at the contracted past tense and participle ending //‘d//. By examining the tagset, we can see that it distinguishes all different types of verbs; modal auxiliaries are tagged as VM, infinitive forms as VVI and so on. In the standard tagest, there is no specific tag for verbs in the forms we are looking for. The easiest way to find the past and past participle contracted forms would then be to search for all verbs that end in //‘d//. To do so, use [[https://wiki.korpus.cz/doku.php/en:pojmy:regularni_vyrazy|regular expressions]]; the sequence ''.*'' in particular. The full stop ''.'' represents any one character, while the asterisk ''*'' matches zero, one or more repetitions of the previous character. The sequence then represents any part of a word or a tag. We can see in the tagset that all verb tags begin with V and we can substitute the rest of the tag with the regular expression ''.*'' to match any verbal tag.
  
 Make sure the query type is set on CQL, you may also set the default attribute below the search window to //tag//, however it is not necessary. If you do so, the square brackets and the specified attribute can be left out in the query (i.e. you can type only ''"V.*"'' into the search box). You may start the query as such: Make sure the query type is set on CQL, you may also set the default attribute below the search window to //tag//, however it is not necessary. If you do so, the square brackets and the specified attribute can be left out in the query (i.e. you can type only ''"V.*"'' into the search box). You may start the query as such:
Line 17: Line 17:
 ''[tag="V.*"]'' ''[tag="V.*"]''
  
-This query alone would find all verbs in the corpus, but what we need is to limit the search to only the verbs which end with //‘d//. For this, you can make use of the ampersand symbol (//&//) which represents the function of AND. When you connect two or more attributes with //&//, the resultant concordance will include only those occurrences which fulfil all the conditions specified in the query. The second part of the query is the //word// attribute; to look for any word which ends with //‘d//, we can use another regular expression. This time, we want to use the symbol + instead of *, since + represents one or more repetitions of the previous character; this way we avoid the possibility of only //‘d// appearing in the concordance.+This query alone would find all verbs in the corpus, but what we need is to limit the search to only the verbs which end with //‘d//. For this, you can make use of the ampersand symbol (''&'') which represents the function of AND. When you connect two or more attributes with ''&'', the resultant concordance will include only those occurrences which fulfil all the conditions specified in the query. The second part of the query is the //word// attribute; to look for any word which ends with //‘d//, we can use another regular expression. This time, we want to use the symbol ''+'' instead of ''*'', since ''+'' represents one or more repetitions of the previous character; this way we avoid the possibility of only //‘d// appearing in the concordance.
  
 ''[tag="V.*" & word=".+'d"]'' ''[tag="V.*" & word=".+'d"]''
Line 29: Line 29:
 Here, the word //something// is tagged as PN1, which corresponds to //indefinite pronoun, singular// in the tagset. Here, the word //something// is tagged as PN1, which corresponds to //indefinite pronoun, singular// in the tagset.
  
-You may change this setting by clicking on **View → Corpus-specific settings** and selecting a different option listed under How to display additional positional attributes?.+You may change this setting by clicking on //View → Corpus-specific settings// and selecting a different option listed under //How to display additional positional attributes?//.
  
 You may have noticed that the forms you searched for (KWIC) are tagged as VVX. This tag is not a part of the standard CLAWS 7 tagset but it was added during the tagging process specifically to the OBC. To read more about the corrections, see the [[https://fedora.clarin-d.uni-saarland.de/oldbailey/downloads/OBC_2.0_Manual%202016-07-13.pdf|OBC Manual]], page 12. Hence, it is recommendable not to always rely on the tagest only, but rather to check the actual tagging in the given string and build your query according to that. You may have noticed that the forms you searched for (KWIC) are tagged as VVX. This tag is not a part of the standard CLAWS 7 tagset but it was added during the tagging process specifically to the OBC. To read more about the corrections, see the [[https://fedora.clarin-d.uni-saarland.de/oldbailey/downloads/OBC_2.0_Manual%202016-07-13.pdf|OBC Manual]], page 12. Hence, it is recommendable not to always rely on the tagest only, but rather to check the actual tagging in the given string and build your query according to that.
  
-Let’s check the frequency list (**Frequency → Node forms [A=a]**) to see which verbs are most commonly contracted in this way.+Let’s check the frequency list (//Frequency → Node forms [A=a]//) to see which verbs are most commonly contracted in this way.
  
 {{:en:obc:l4_2.png?direct&300|}} {{:en:obc:l4_2.png?direct&300|}}
Line 41: Line 41:
 ''[word="deposed" | word="asked" | word="called" | word="robbed"]'' ''[word="deposed" | word="asked" | word="called" | word="robbed"]''
  
-Go to the frequency list (**Frequency → Node forms [A=a]**) and compare:+Go to the frequency list (//Frequency → Node forms [A=a]//) and compare:
  
 {{:en:obc:l4_3.png?direct&300|}} {{:en:obc:l4_3.png?direct&300|}}
Line 48: Line 48:
 **Task:** **Task:**
  
-    * Try to find all plural nouns in the genitive case which are formed with the //‘s// suffix+ Try to find all plural nouns in the genitive case which are formed with the //‘s// suffix
     * Keep in mind the different tags for different classes of nouns     * Keep in mind the different tags for different classes of nouns
     * Make sure the query type is set to CQL     * Make sure the query type is set to CQL
Line 54: Line 54:
 </WRAP>  </WRAP> 
  
-Solution in KonText [[https://kontext.korpus.cz/view?q=~6AljBlGNoxr6|here]]:+You can find the solution [[en:obc:solution|here]].
  
-Query: ''[tag="N.*2"][tag="GE"]'' 
  
-//prisoners’s// 14x, //prosecutors’s// 10x+----
  
-Proceed to [[en:obc:intro_to_metadata|Lesson 5]].+**If you are ready, you can continue to [[en:obc:intro_to_metadata|Lesson 5]].** 
 + 
 +----