Lesson 6: Specify query (Metadata continued)
In this lesson, we will look at how you can use the KonText interface to specify or limit the query based on the metadata before the search itself is initiated.
Due to the complexity of the trial proceedings, there is a number of complicating factors which may affect your search and results. One of them is that a single trial may involve more than one defendant and hence more than one offence, verdict, and punishment. Moreover, it is not always apparent from the text of the proceedings, which defendant was speaking at a particular moment (the utterances are often marked simply by the role of the participant as in D for defendant) and different defendants may have started with different offences or ended with different verdicts and punishments. Therefore, it was not possible to mark individual utterances with attributes pertaining to these attributes. The way this was dealt with is that individual proceedings (texts) rather than utterances (that appear in those texts) were marked by these attributes and separate categories which contain combinations of these elements were created. Therefore, you may, for example, encounter a text attribute offenceCategory with values of these offence categories such as breakingPeace | theft | violentTheft, kill | theft | violentTheft and so on, meaning the particular proceeding concerns all these types of offences. It is important to keep this in mind when inputting your query.
Searching the corpus
To search the corpus using a specified query, open the KonText interface and make sure you have the OBC selected. You can choose any query type – for this lesson, let’s use the basic type. Let’s say we are interested in the language of women in the 19th century, who were convicted of theft and either transported or sentenced to death, and we would like to know the frequency of interrogative sentences used by them. To find exclamative sentences, simply type ?
into the search box. To specify the characteristics of the utterances we are looking for, click on Restrict search.
Now you can limit your search by ticking the appropriate boxes. The numbers situated in the right column indicate how many positions (tokens) fall within the given category (e.g. the Advertisements texts amount to 125,453 tokens).
There are five types of texts in the OBC:
- Front matter: includes general information about the session – names of court officials, people responsible for the transcription etc.
- Punishment summary: list of the sentences and punishments administrated by the court
- Supplementary material
- Trial account: includes information about the defendants, witnesses, victims, descriptions of the crimes and transcriptions of the testimonies
Firstly, to get the actual utterances of the defendants, it is necessary to select the trialAccount category only. As we are interested in the language of the 19th century, you need to delimit the given time span in the text.year box. In the text.offenceCategory box you will find many different combinations of offences and, as it was mentioned above, you need to be careful when making your selection. Multiple offences divided by the vertical bar indicate that there were multiple defendants present at the trial and to distinguish which person committed which crime and what was their punishment can be quite a demanding task, as it would be necessary to go through each trial account individually and read the transcription.
So, to make sure you include only the people convicted of committing the crime of theft, select the options which include only theft. Here, you have a number of choices: either theft, violentTheft or theft | violentTheft. Selecting all will still ensure including only people convicted of theft in your search. However, when the other categories which include theft (e.g. deception | sexual | theft) are left out, the search will not consist of all the trials which deal with the offence of theft.
Next, the punishment needs to be selected. Find the text.punishmentCategory box and select death, death | transport and transport (for more information on offences, verdicts and punishments, see here). You also need to select the role of the utterance speaker, so as not to include utterances spoken by, for example, the judge. Go to the utterance.speaker_role box and select Defendant. Lastly, find the utterance.speaker_sex box and select f (female). You can delimit your search further by modifying any of the categories available. When you are satisfied with your selection, hit the search button. You can view the Text types frequency list (Frequency → Text Types) to see all variables, including those which you did not specify in your query.
Task:
- Find all occurrences of the word God and combine the following parameters:
- spoken during the trial
- only in the 18th century
- the defendant was found guilty
- spoken by the victim or witness
- spoken by a male
- View the Text Types frequency lists and see whether the speaker comes from a high or low class environment, and which publisher was most frequently responsible for the publishing of these specific proceedings.
- Did the victims or the witnesses use the word more often?
You can find solution here.
If you are ready, you can continue to Lesson 7.