AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:obc:spelling [2020/02/17 17:04] jankoceken:obc:spelling [2020/02/27 12:19] (current) jankocek
Line 3: Line 3:
 The OBC covers the period between 1720 and 1913 and even though the language was undergoing standardization and was being heavily prescribed by normative grammarians, there still is a lot of variation to be found in all the areas of language use. This lesson will focus on variation in spelling in the OBC. The OBC covers the period between 1720 and 1913 and even though the language was undergoing standardization and was being heavily prescribed by normative grammarians, there still is a lot of variation to be found in all the areas of language use. This lesson will focus on variation in spelling in the OBC.
  
-Since the corpus is not lemmatized, it is impossible to search for a word in its canonical dictionary form and see all the variants which occur in the texts. However, you can search for single variations using the basic query or word form query, which were discussed in the previous lesson. To search for multiple specific word forms simultaneously is possible by making use of CQL. To do so, it is appropriate to consult a reliable source (e.g. the [[https://www.oed.com/|OED Online]]) to find all the possible variants which were in use during the given period.+Since the corpus is not lemmatized, it is impossible to search for a word in its canonical dictionary form and see all the variants which occur in the texts. However, you can search for single variations using the basic query or word form query, which were discussed in [[en:obc:query_types|the previous lesson]]. To search for multiple specific word forms simultaneously is possible by making use of CQL. To do so, it is appropriate to consult a reliable source (e.g. the [[https://www.oed.com/|OED Online]]) to find all the possible variants which were in use during the given period.
  
 **Some of the patterns of spelling variation in the 18<sup>th</sup> and 19<sup>th</sup> centuries** **Some of the patterns of spelling variation in the 18<sup>th</sup> and 19<sup>th</sup> centuries**
Line 9: Line 9:
 |**Description**                                 |**Examples**                                                        | |**Description**                                 |**Examples**                                                        |
 |//-ic// spelled as //-ick//                     |//public(k), catholic(k), music(k), magic(k)//                      | |//-ic// spelled as //-ick//                     |//public(k), catholic(k), music(k), magic(k)//                      |
-|'//'-ed­ //in past tense and participles as //‘d//|<html><u></html>//call’d, cry’d, confess’d, ask’d//<html></u></html>|+|//-ed­ //in past tense and participles as //‘d// |//call’d, cry’d, confess’d, ask’d// |
 |//or// //our//variation in BrE                 |//favo(u)r, hono(u)r, colo(u)r, labo(u)r//                          | |//or// //our//variation in BrE                 |//favo(u)r, hono(u)r, colo(u)r, labo(u)r//                          |
 |//s// //z //                                    |//surpris/ze//, //recognis/ze//, //apologis/ze// //cruis/ze//       | |//s// //z //                                    |//surpris/ze//, //recognis/ze//, //apologis/ze// //cruis/ze//       |
Line 18: Line 18:
 To search for multiple forms, set the query type to [[https://wiki.korpus.cz/doku.php/en:pojmy:dotazovaci_jazyk|CQL]]. Now let’s try and search for all the variants of the word //public//. According to the OED, there were multiple forms found during the 18<sup>th</sup> and 19<sup>th</sup> century: //public//, //publik//, //publick.// To search for multiple forms, set the query type to [[https://wiki.korpus.cz/doku.php/en:pojmy:dotazovaci_jazyk|CQL]]. Now let’s try and search for all the variants of the word //public//. According to the OED, there were multiple forms found during the 18<sup>th</sup> and 19<sup>th</sup> century: //public//, //publik//, //publick.//
  
-When using CQL, each element of the query has to be enclosed in square brackets []. The type of search we intend to conduct is specified by the attribute; to search for lemmas, type //lemma=//, for tags, type //tag=//, etc. In this case, you are looking for specific word forms, so //word=// should be used. The specific search items (words, lemmas, tags etc.) must be inserted into quotation marks "". For example, to search for the word //public//, type:+When using CQL, each element of the query has to be enclosed in square brackets []. The type of search we intend to conduct is specified by the attribute; to search for lemmas, type ''lemma='', for tags, type ''tag='', etc. In this case, you are looking for specific word forms, so ''word='' should be used. The specific search items (words, lemmas, tags etc.) must be inserted into quotation marks ''""''. For example, to search for the word //public//, type:
  
-[word="public"]+''[word="public"]''
  
-Searching for all of the three forms mentioned above simultaneously requires the use of the pipe symbol | which functions as an OR operator:+Searching for all of the three forms mentioned above simultaneously requires the use of the pipe symbol ''|'' which functions as an OR operator:
  
-[word="public" | word="publik" | word="publick"* Searches for //public// OR //publik //OR //publick//+''[word="public" | word="publik" | word="publick"]'' 
 + 
 +(searches for //public// OR //publik //OR //publick//)
  
 You need to keep in mind that CQL is case-sensitive, therefore, to find all occurrences of these words regardless of capitalization, it is necessary to add the forms with capital letters. For this operation, insert another set of square brackets into the value in quotation marks; the items within the square brackets form a set from which one item is selected: You need to keep in mind that CQL is case-sensitive, therefore, to find all occurrences of these words regardless of capitalization, it is necessary to add the forms with capital letters. For this operation, insert another set of square brackets into the value in quotation marks; the items within the square brackets form a set from which one item is selected:
  
-[word="[Pp]ublic" | word="[Pp]ublik" | word="[Pp]ublick"]+''[word="[Pp]ublic" | word="[Pp]ublik" | word="[Pp]ublick"]''
  
-Alternatively, you can also use the specific sequence of characters (?i), which, when used right after the quotation marks, makes the whole query case-insensitive:+Alternatively, you can also use the specific sequence of characters ''(?i)'', which, when used right after the quotation marks, makes the whole query case-insensitive:
  
-[word="(?i)public" | word="(?i)publik" | word="(?i)publick"]+''[word="(?i)public" | word="(?i)publik" | word="(?i)publick"]''
  
 This query may be more suitable, as it allows for any of the letters to be capitalized, hence more occurrences of the word may be found. This query may be more suitable, as it allows for any of the letters to be capitalized, hence more occurrences of the word may be found.
  
-According to the OED, in the 17<sup>th</sup> century these variants were sometimes written with a final //-e//. Let’s say you want to make sure you include these forms in the search, in case this form appeared even in the 18<sup>th</sup> or 19<sup>th</sup> (or 20<sup>th</sup>) century. To do so, let’s employ another regular expression. The //?// symbol functions as a means of indicating that the element directly preceeding is optional. Hence:+According to the OED, in the 17<sup>th</sup> century these variants were sometimes written with a final //-e//. Let’s say you want to make sure you include these forms in the search, in case this form appeared even in the 18<sup>th</sup> or 19<sup>th</sup> (or 20<sup>th</sup>) century. To do so, let’s employ another regular expression. The ''?'' symbol functions as a means of indicating that the element directly preceeding is optional. Hence:
  
-[word="(?i)publice?" | word="(?i)publike?" | word="(?i)publicke?"]+''[word="(?i)publice?" | word="(?i)publike?" | word="(?i)publicke?"]''
  
 If you wish to condense the query, simply combine what you have learned in the previous steps in the following manner: If you wish to condense the query, simply combine what you have learned in the previous steps in the following manner:
  
-[word="(?i)publi[ck]k?e?"]* The whole search is case-insensitive, and contains all the forms which were previously inputted separately; the initial sequence //publi// is present in all of them, it is followed by either //c// or //k//, the subsequent character //k// is optional (it would most likely occur after //c//), and the final //e// is also marked as optional.+''[word="(?i)publi[ck]k?e?"]''
  
-<html><u></html>Task:<html></u></html>* What should be the query to find all possible spellings of the noun breeches?+The whole search is case-insensitive, and contains all the forms which were previously inputted separately; the initial sequence //publi// is present in all of them, it is followed by either //c// or //k//, the subsequent character //k// is optional (it would most likely occur after //c//), and the final //e// is also marked as optional. 
 + 
 +<WRAP round help 40%> 
 +**Task:**  
 + 
 +What should be the query to find all possible spellings of the noun //breeches//? 
 +</WRAP>
  
 After consulting the dictionary, you may expect the following forms: //breeches//, //breaches//, //brieches//, //briches//, //breetches//, //britches//. After consulting the dictionary, you may expect the following forms: //breeches//, //breaches//, //brieches//, //briches//, //breetches//, //britches//.
  
-Let’s begin with making the whole search case-insensitive by inserting the sequence //(?i)// right after the quotation marks.+Let’s begin with making the whole search case-insensitive by inserting the sequence ''(?i)'' right after the quotation marks.
  
-[word="(?i)"]+''[word="(?i)"]''
  
-The first two characters //br// should be present in all forms, however the following vowels do display some degree of variation. The first vowel, according to the OED, alternates between //e// and //i//, so it is necessary to enclose these two characters in square brackets [ei].+The first two characters //br// should be present in all forms, however the following vowels do display some degree of variation. The first vowel, according to the OED, alternates between //e// and //i//, so it is necessary to enclose these two characters in square brackets ''[ei]''.
  
-[word="(?i)br[ei]"]+''[word="(?i)br[ei]"]''
  
-The next vowel appears to be either //e// or //a//, however it is optional (see //briches//) – [ea] followed by the question mark ? to signal optionality.+The next vowel appears to be either //e// or //a//, however it is optional (see //briches//) – ''[ea]'' followed by the question mark ? to signal optionality.
  
-[word="(?i)br[ei][ea]?"]+''[word="(?i)br[ei][ea]?"]''
  
 What may come after is the consonant //t//, followed by the sequence //ch//, which appears in all variants. What may come after is the consonant //t//, followed by the sequence //ch//, which appears in all variants.
  
-[word="(?i)br[ei][ea]?t?ch"]+''[word="(?i)br[ei][ea]?t?ch"]''
  
-All the forms end with final //s// and according to the OED, it is always preceded by //e//. However, to make sure we search for all the possible variants occurring in the OBC, we may want to use some regular expressions (more on this in Lesson 3) to mark the possibility of other characters appearing. The plural ending might have been spelt in various ways, so it is recommended to employ the sequence [[https://wiki.korpus.cz/doku.php/en:pojmy:regularni_vyrazy|.*]] (see Lesson 4) which represents any sequence of characters (or none). The final query should then look like this:+All the forms end with final //s// and according to the OED, it is always preceded by //e//. However, to make sure we search for all the possible variants occurring in the OBC, we may want to use some regular expressions (more on this in [[en:obc:spell2|Lesson 3]]) to mark the possibility of other characters appearing. The plural ending might have been spelt in various ways, so it is recommended to employ the sequence ''.*'' (see [[en:obc:spell3|Lesson 4]]) which represents any sequence of characters (or none). The final query should then look like this:
  
-[word="(?i)br[ei][ea]?t?ch.*s"]+''[word="(?i)br[ei][ea]?t?ch.*s"]''
  
-To view the list of all the variants which occur in the corpus, click on Frequency → Node forms [A=a].+To view the list of all the variants which occur in the corpus, click on //Frequency → Node forms [A=a]//.
  
-Note the forms which were not included in the list available in the OED: //breechees//, //breachings// and //breches//. You will learn how to work further with the frequency list in the following Lesson 3.+Note the forms which were not included in the list available in the OED: //breechees//, //breachings// and //breches//.
  
 {{:en:obc:l2_1a.png?direct&|}} {{:en:obc:l2_1a.png?direct&|}}
  
 +----
 +
 +**If you are ready, you can continue to [[en:obc:spell2|Lesson 3]].**
 +
 +----