Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:obc:intro_to_metadata [2020/02/19 12:13] – Michal Škrabal | en:obc:intro_to_metadata [2021/02/10 18:26] (current) – Michal Křen | ||
---|---|---|---|
Line 10: | Line 10: | ||
|// | |// | ||
|// | |// | ||
+ | |// | ||
|// | |// | ||
|// | |// | ||
|// | |// | ||
+ | |// | ||
|// | |// | ||
|// | |// | ||
Line 24: | Line 26: | ||
There are two issues to be addressed: firstly, it is the fact that not every variable is always known, therefore some information may occasionally be missing. | There are two issues to be addressed: firstly, it is the fact that not every variable is always known, therefore some information may occasionally be missing. | ||
- | Secondly, oftentimes the trials involved multiple defendants (and hence multiple offences, punishments, | + | Secondly, oftentimes the trials involved multiple defendants (and hence multiple offences, punishments, |
The direct speech in the text is tagged as individual utterances, which are assigned the following parameters: | The direct speech in the text is tagged as individual utterances, which are assigned the following parameters: | ||
- | * Sociobiographical: | + | * Sociobiographical: |
* Pragmatic: speaker’s role in the court (defendant, lawyer, judge, witness etc.) | * Pragmatic: speaker’s role in the court (defendant, lawyer, judge, witness etc.) | ||
* Textual: scribe, printer, publisher of the individual proceedings (these are already provided in the metadata of the text, but providing these parameters at the utterance level makes some type of queries much simpler) | * Textual: scribe, printer, publisher of the individual proceedings (these are already provided in the metadata of the text, but providing these parameters at the utterance level makes some type of queries much simpler) | ||
Line 34: | Line 36: | ||
|**“utterance” structure attributes**|**Description** | |**“utterance” structure attributes**|**Description** | ||
- | |//editor // | + | |//editor // |
|//id // | |//id // | ||
|//n // |number of the utterance in the proceedings|// | |//n // |number of the utterance in the proceedings|// | ||
Line 41: | Line 43: | ||
|//printer // |printer of the text | |//printer // |printer of the text | ||
|// | |// | ||
- | |//scribe // | + | |//scribe // |
|// | |// | ||
|// | |// | ||
Line 52: | Line 54: | ||
**Searching the corpus** | **Searching the corpus** | ||
- | Verbs in the progressive passive tense are formed by the auxiliary verb //be// followed by the present participle form //being// plus the past participle of a full verb, e.g. //I am being watched//, //the house was being built.// Searching for such constructions is done best by the use of tags (see Lesson 4). | + | Verbs in the progressive passive tense are formed by the auxiliary verb //be// followed by the present participle form //being// plus the past participle of a full verb, e.g. //I am being watched//, //the house was being built.// Searching for such constructions is done best by the use of tags (see [[en: |
- | For the auxiliary verb, we need to search for //am//, //are//, //is//, '' | + | For the auxiliary verb, we need to search for //am//, //are//, //is//, '' |
'' | '' | ||
Line 62: | Line 64: | ||
'' | '' | ||
- | Alternatively, | + | Alternatively, |
For the lexical verb, we are looking for all past participles. According to the tagset, this verb form is tagged either as VVN or VVNK. Hence, we can use the shortened version VVN.*. The resulting query should look like this: | For the lexical verb, we are looking for all past participles. According to the tagset, this verb form is tagged either as VVN or VVNK. Hence, we can use the shortened version VVN.*. The resulting query should look like this: | ||
Line 68: | Line 70: | ||
'' | '' | ||
- | If you wish to see an overview of the structural attributes of the whole concordance along with their frequencies, | + | If you wish to see an overview of the structural attributes of the whole concordance along with their frequencies, |
{{: | {{: | ||
Line 76: | Line 78: | ||
{{: | {{: | ||
- | It is important to note here, that some of the utterances are not tagged fully; in this case, there are 48 utterances that are missing the information about the decade in which they were written. You can use the negative filter (p/**n**) to discard them and work only with the fully annotated data. | + | It is important to note here, that some of the utterances are not tagged fully; in this case, there are 48 utterances that are missing the information about the decade in which they were written. You can use the negative filter (p///n//) to discard them and work only with the fully annotated data. |
By clicking on the header of each column, you can change the sorting – alphabetically according to the labels of that attribute (here decades), according to the frequency or i.p.m. Here i.p.m. (Items Per Million) indicates the relative frequency of the given form in relation to the overall size of the part of the corpus tagged with the respective value of the structural attribute (e.g. in this case the number of occurrences per million tokens in each decade). The relative frequency allows for comparison of the number of occurrences in differently-sized parts of the corpus. | By clicking on the header of each column, you can change the sorting – alphabetically according to the labels of that attribute (here decades), according to the frequency or i.p.m. Here i.p.m. (Items Per Million) indicates the relative frequency of the given form in relation to the overall size of the part of the corpus tagged with the respective value of the structural attribute (e.g. in this case the number of occurrences per million tokens in each decade). The relative frequency allows for comparison of the number of occurrences in differently-sized parts of the corpus. | ||
Line 86: | Line 88: | ||
{{: | {{: | ||
- | Here you can see all the information available for the given utterance. As was mentioned above, some information may be missing. You can access the whole text of the proceeding including the scan of the original publication by clicking on the link under **text.url**. | + | Here you can see all the information available for the given utterance. As was mentioned above, some information may be missing. You can access the whole text of the proceeding including the scan of the original publication by clicking on the link under //text.url//. |
<WRAP round help 40%> | <WRAP round help 40%> | ||
Line 95: | Line 97: | ||
* Make use of the tags from the tagset | * Make use of the tags from the tagset | ||
* Look at the text types list and find when, in which contexts (e.g. type of offence) and by whom these structures were most frequently used | * Look at the text types list and find when, in which contexts (e.g. type of offence) and by whom these structures were most frequently used | ||
- | </ | + | </ |
- | Solution: | + | You will find the solution [[en:obc: |
- | [[https:// | + | ---- |
- | + | ||
- | Query: '' | + | |
- | + | ||
- | **Frequency → Text Types** | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | [[https:// | + | |
- | + | ||
- | Query: '' | + | |
- | + | ||
- | **Frequency → Text Types** | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | {{: | + | |
+ | **If you are ready, you can continue to [[en: | ||
+ | ---- |