AplikaceAplikace
Nastavení

Lesson 2: Spelling

The OBC covers the period between 1720 and 1913 and even though the language was undergoing standardization and was being heavily prescribed by normative grammarians, there still is a lot of variation to be found in all the areas of language use. This lesson will focus on variation in spelling in the OBC.

Since the corpus is not lemmatized, it is impossible to search for a word in its canonical dictionary form and see all the variants which occur in the texts. However, you can search for single variations using the basic query or word form query, which were discussed in the previous lesson. To search for multiple specific word forms simultaneously is possible by making use of CQL. To do so, it is appropriate to consult a reliable source (e.g. the OED Online) to find all the possible variants which were in use during the given period.

Some of the patterns of spelling variation in the 18th and 19th centuries

Description Examples
-ic spelled as -ick public(k), catholic(k), music(k), magic(k)
-ed­ in past tense and participles as ‘d call’d, cry’d, confess’d, ask’d
or ourvariation in BrE favo(u)r, hono(u)r, colo(u)r, labo(u)r
s z surpris/ze, recognis/ze, apologis/ze cruis/ze

Searching the corpus

To search for multiple forms, set the query type to CQL. Now let’s try and search for all the variants of the word public. According to the OED, there were multiple forms found during the 18th and 19th century: public, publik, publick.

When using CQL, each element of the query has to be enclosed in square brackets []. The type of search we intend to conduct is specified by the attribute; to search for lemmas, type lemma=, for tags, type tag=, etc. In this case, you are looking for specific word forms, so word= should be used. The specific search items (words, lemmas, tags etc.) must be inserted into quotation marks “”. For example, to search for the word public, type:

[word=“public”]

Searching for all of the three forms mentioned above simultaneously requires the use of the pipe symbol | which functions as an OR operator:

[word=“public” | word=“publik” | word=“publick”]

(searches for public OR publik OR publick)

You need to keep in mind that CQL is case-sensitive, therefore, to find all occurrences of these words regardless of capitalization, it is necessary to add the forms with capital letters. For this operation, insert another set of square brackets into the value in quotation marks; the items within the square brackets form a set from which one item is selected:

[word=“[Pp]ublic” | word=“[Pp]ublik” | word=“[Pp]ublick”]

Alternatively, you can also use the specific sequence of characters (?i), which, when used right after the quotation marks, makes the whole query case-insensitive:

[word=“(?i)public” | word=“(?i)publik” | word=“(?i)publick”]

This query may be more suitable, as it allows for any of the letters to be capitalized, hence more occurrences of the word may be found.

According to the OED, in the 17th century these variants were sometimes written with a final -e. Let’s say you want to make sure you include these forms in the search, in case this form appeared even in the 18th or 19th (or 20th) century. To do so, let’s employ another regular expression. The ? symbol functions as a means of indicating that the element directly preceeding is optional. Hence:

[word=“(?i)publice?” | word=“(?i)publike?” | word=“(?i)publicke?”]

If you wish to condense the query, simply combine what you have learned in the previous steps in the following manner:

[word=“(?i)publi[ck]k?e?”]

The whole search is case-insensitive, and contains all the forms which were previously inputted separately; the initial sequence publi is present in all of them, it is followed by either c or k, the subsequent character k is optional (it would most likely occur after c), and the final e is also marked as optional.

Task:

What should be the query to find all possible spellings of the noun breeches?

After consulting the dictionary, you may expect the following forms: breeches, breaches, brieches, briches, breetches, britches.

Let’s begin with making the whole search case-insensitive by inserting the sequence (?i) right after the quotation marks.

[word=“(?i)”]

The first two characters br should be present in all forms, however the following vowels do display some degree of variation. The first vowel, according to the OED, alternates between e and i, so it is necessary to enclose these two characters in square brackets [ei].

[word=“(?i)br[ei]”]

The next vowel appears to be either e or a, however it is optional (see briches) – [ea] followed by the question mark ? to signal optionality.

[word=“(?i)br[ei][ea]?”]

What may come after is the consonant t, followed by the sequence ch, which appears in all variants.

[word=“(?i)br[ei][ea]?t?ch”]

All the forms end with final s and according to the OED, it is always preceded by e. However, to make sure we search for all the possible variants occurring in the OBC, we may want to use some regular expressions (more on this in Lesson 3) to mark the possibility of other characters appearing. The plural ending might have been spelt in various ways, so it is recommended to employ the sequence .* (see Lesson 4) which represents any sequence of characters (or none). The final query should then look like this:

[word=“(?i)br[ei][ea]?t?ch.*s”]

To view the list of all the variants which occur in the corpus, click on Frequency → Node forms [A=a].

Note the forms which were not included in the list available in the OED: breechees, breachings and breches.


If you are ready, you can continue to Lesson 3.