AplikaceAplikace
Nastavení

This is an old revision of the document!


Lesson 8: Multiword searches

In the Early Modern English period, there were two different ways of marking the perfect tenses. In the present day, the auxiliary have is used for example in the present perfect, as in It has come to my attention. However, as late as the eighteenth century, the perfect tenses could be marked with the auxiliary to be. These two markings were more or less in complementary distribution, i.e. they were used with different types of verbs. According to the OED , to be was the preferred way of forming the perfect verbs of motion, while to have was used in most other cases. Shakespeare normally uses the auxiliary to be with creep, enter, flee, go, meet, retire, ride, and run.

Searching the corpus

If searching for one specific form such as is arrived, we may use the basic query as described in our first lesson .

However, in this case we want to find all the possible variants, am come, are come the query can be written in CQL (Corpus Query Language) using a number of regular expressions. Furthermore, the CQL query mode in the KonText interface is case sensitive, and therefore both variants should be included in the query in order for us to obtain as many relevant hits as possible.

For example, searching for the two forms of the verb am and are simultaneously requires use of the vertical bar |, which is a regular expression functioning as “or”, i.e. it returns either am or are. Such a query is written as [word=“am”]|[word=“are”]

By adding the past participle form of a selected verb to the query, for example arrived, we get [word=“am”]|[word=“are”][word=“arrived”] However, this only returns the node forms am and are arrived. This is remedied with the help of the round brackets (), which enable us to create a hierarchy within the query, giving the text within the brackets a higher priority. Therefore, the am arrived, are arrived, and is arrived are all contained in the following query:

([word=“am”]|[word=“are”]|[word=“is”])[word=“arrived”]

If we want the search to include both variants, i.e. be and have, we can include all of the possible forms in the query. Furthermore, we want to include all the possible spelling variants (see our second lesson). The final query could look like this:

([word=“am”]|[word=“are”]|[word=“[iy]s”]|[word=“has”]|[word=“ha[uv]e”])[word=“ar?ri[uv]ed”]

Remember that the construction to be + past participle is also used as a means of expressing passive voice. This difficulty does not arise with arrive, we must be careful when searching for transitive verbs.

After consulting an etymological dictionary and also based on our previous experience, we know that we may expect forms such as diuine, deuine, dywine, and divinne (among many others). The vowel can be either e, i or y, which is simplified as [eiy]. The same is true of the set [uvw]. The form always contains at least one nasal n, while the second is only optional (indicated by the question mark). The final silent e is also optional. The resulting query should look like this:

[word=“[dD][eiy][uvw][iy]nn?e?”]

Frequency → Node forms provides a listing of all types found with the given query in order of frequency.

EEBO_pic02.png

A similar example is with the word godly:

Godly: [word=“[gG]oo?dle?[yi]c?k?e?”]

This query returns not only the frequent forms godly, goodly, godlie, godlye and goodlye, but also much less frequent (and much less anticipated) variants such as godlyc and even godlycke. In the latter we can observe remnants of the Old English adjectival suffixes ←līc> and ←līce>.

Task: Spelling variants

  • Find as many spelling variants of the word royal as possible
  • Keep in mind the spelling conventions and irregularities mentioned above
  • Make sure that the Query type is set to CQL