Lesson 8: Multiword searches

In the Early Modern English period, there were two different ways of marking the perfect tense. In the present day, the auxiliary have is used to form the present perfect, as in It has come to my attention. However, as late as the eighteenth century, the perfect tenses could be marked with the auxiliary to be. These two markings were more or less in complementary distribution, i.e. they were used with different types of verbs. According to the OED , to be was the preferred way of forming the perfect verbs of motion, while to have was used in most other cases. Shakespeare normally uses the auxiliary to be with creep, enter, flee, go, meet, retire, ride, and run.

Searching the corpus

If searching for one specific form such as is arrived, we may use the Phrase query type as described in our first lesson .

However, in this case we want to find all the possible variants, am come, are come the query can be written in CQL (Corpus Query Language) using a number of regular expressions. Furthermore, the CQL query mode in the KonText interface is case sensitive, and therefore both variants should be included in the query in order for us to obtain as many relevant hits as possible.

For example, searching for the two forms of the verb am and are simultaneously requires use of the vertical bar |, which is a regular expression functioning as “or”, i.e. it returns either am or are. Such a query is written as [word=“am”]|[word=“are”]

By adding the past participle form of a selected verb to the query, for example arrived, we get [word=“am”]|[word=“are”][word=“arrived”] However, this only returns the node forms am and are arrived. This is remedied with the help of the round brackets (), which enable us to create a hierarchy within the query, giving the text within the brackets a higher priority. Therefore, the am arrived, are arrived, and is arrived are all contained in the following query:

([word=“am”]|[word=“are”]|[word=“is”])[word=“arrived”]

If we want the search to include both variants, i.e. be and have, we can include all of the possible forms in the query. Furthermore, we want to include all the possible spelling variants (see Lesson Two). The final query could look like this:

([word=“am”]|[word=“are”]|[word=“[iy]s”]|[word=“has”]|[word=“ha[uv]e”])[word=“ar?ri[uv]ed”]

Frequency → Node forms provides a listing of all types found with the given query in order of frequency. Below are the results of the search viewed as node forms.

By selecting the positive filter, we can view the node forms individually, e.g. all the instances of have arrived. Alternately, we can conduct a separate search for all the variants with the auxiliary have and all those with the auxiliary be.

A number of things can be done at this stage, including a survey of the i.p.m. by period.

Frequency → Text Types on the menu bar shows us a list of frequencies by period and by decade. By default, the results are ordered by frequency. By clicking on the text doc.decade we can order the results chronologically.

Lesson Three gives an example of how such data could be converted into a graph.

Remember that the construction to be + past participle is also used as a means of expressing passive voice. This difficulty does not arise with arrive, we must keep this possibility in mind when searching for transitive verbs. In the case of a verb such as enter, where the phrase is entered can potentially have both the perfect and passive meanings, there is unfortunately not much we can do to eliminate the undesired variant. Another complication arises from the use of the participle as adjective, for example I am ashamed of…