Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
en:manualy:kontext:korpusy [2018/08/06 10:54] – [Using subcorpora] michalskrabal | en:manualy:kontext:korpusy [2023/03/09 11:05] – jankocek | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Menu: Corpora ====== | ====== Menu: Corpora ====== | ||
- | ====== Available Corpora | + | FIXME The first menu item **Available corpora** opens a page where the user can search for all corpora to which he has access. |
+ | |||
+ | The remaining three items are dedicated to **virtual sub-corpora** (i.e. subsets of texts from the original corpus). Here it is possible both to create your own subcorpus and to manage existing subcorpora (work with the subcorpus draft, view finished subcorpora, archive them, delete them, etc.). | ||
+ | |||
+ | Subcorpora are tied to the user account. Virtual subcorpora are therefore available to [[en: | ||
+ | |||
+ | Generally speaking, a subcorpus is only an additional condition which is applied to all queries in the search. For example, if we are searching for the lemma //dřevo// in the fiction subcorpus SYN2020: | ||
+ | |||
+ | ===== Available Corpora ===== | ||
A list of all the corpora available to the user are accessible via the menu item **Corpora → Available corpora**. Due to the large number of corpora and their respective versions, **following the first login**, the user is shown a pre-filtered list of corpora with the label “Czech” (containing both the SYN series corpora, and the ORAL series, and many specialized and hosted corpora). A complete **list of all corpora in alphabetical order** appears after clicking on the label “Reset, | A list of all the corpora available to the user are accessible via the menu item **Corpora → Available corpora**. Due to the large number of corpora and their respective versions, **following the first login**, the user is shown a pre-filtered list of corpora with the label “Czech” (containing both the SYN series corpora, and the ORAL series, and many specialized and hosted corpora). A complete **list of all corpora in alphabetical order** appears after clicking on the label “Reset, | ||
Line 9: | Line 17: | ||
Similarly as with [[en: | Similarly as with [[en: | ||
- | === Subcorpora and parallel corpora in the favourites list === | + | ==== Subcorpora and parallel corpora in the favourites list ==== |
As a favourite item we may label not only an entire independent corpus, but also a corpus including Subcorpora or aligned groups of two or three corpora within a parallel corpus [[en: | As a favourite item we may label not only an entire independent corpus, but also a corpus including Subcorpora or aligned groups of two or three corpora within a parallel corpus [[en: | ||
- | ====== Working with subcorpora ====== | ||
- | [[en: | + | ===== Create a new subcorpus |
- | Subcorpora are tied to the user account. Virtual subcorpora are therefore available to [[en: | + | [{{ : |
- | + | ||
- | Generally speaking, a subcorpus is only an additional condition which is applied to all queries in the search. For example, if we are searching for the lemma //dřevo// in the fiction subcorpus SYN2010: | + | |
- | + | ||
- | ===== Creating a new subcorpus ===== | + | |
- | + | ||
- | [{{ : | + | |
In the case that we want to, in the long term, work only with a specific group of texts in the given corpus, it pays off to create and save our own subcorpus on the server (on the other hand, with ad hoc searches in a subgroup of texts it is better to select the option [[en: | In the case that we want to, in the long term, work only with a specific group of texts in the given corpus, it pays off to create and save our own subcorpus on the server (on the other hand, with ad hoc searches in a subgroup of texts it is better to select the option [[en: | ||
Line 31: | Line 32: | ||
- a default corpus, from which the text will be selected | - a default corpus, from which the text will be selected | ||
- a subcorpus name, an unambiguous identifier which has not been previously used in the list of existing subcorpora | - a subcorpus name, an unambiguous identifier which has not been previously used in the list of existing subcorpora | ||
+ | - FIXME if we wish the subcorpus to be searchable using the page **Corpora → Public subcorpora**, | ||
- a condition based on which we select the text for the subcorpus | - a condition based on which we select the text for the subcorpus | ||
The condition can be specified with a [[en: | The condition can be specified with a [[en: | ||
- | Within this form it is possible to select those structural attribute values that interest us. The form does not contain all the structural attributes, but only those most frequently used in the given corpus (e.g. when searching in [[en:cnk:syn2010|SYN2010]] it is [[en: | + | Within this form it is possible to select those structural attribute values that interest us. The form does not contain all the structural attributes, but only those most frequently used in the given corpus (e.g. when searching in [[en:cnk:syn2015|SYN2015]] or [[en: |
Selection is governed by the same principles as in the case of query specification according to metainformation (see description of item [[en: | Selection is governed by the same principles as in the case of query specification according to metainformation (see description of item [[en: | ||
- | ===== List of existing | + | [{{ : |
+ | |||
+ | FIXME If a subcorpus is created by selecting structural attribute values, the resulting subcorpus can be combined in a concordance query with an ad hoc selection of text type values, where the values corresponding to the content of the selected subcorpus are automatically preselected at the beginning. This makes it possible to further specify the desired text types in the subcorpus. | ||
+ | |||
+ | Another option is to mix the subcorpus according to your own criteria (e.g. 50% of texts from fiction and 50% of journalistic texts). If you want to use this feature, when creating the subcorpus, tick the desired text types within the selected attribute and then click on **Refine selection**. This will make the **Custom text type proportions** function available. So, for example, if we want a SYN2015 journalism subcorpus that contains 50% national press and 50% regional press (the default is 75% national press and 25% regional press), we check both desired genres -- NTW: national press and REG: regional press -- in the doc.genre field and narrow the selection. Then we select the Custom text type ratios function and change the ratios to 50% and 50%. The resulting subcorpus will contain randomly selected texts from both genres in the ratio we chose. FIXME | ||
+ | |||
+ | <WRAP round important 60%> | ||
+ | FIXME Please be aware that the use of more than one structural attribute can easily lead to specifications that cannot be satisfied by any selection of texts from the original corpus. In this case, a subcorpus will not be created. FIXME | ||
+ | </ | ||
+ | |||
+ | ==== Creating a subcorpus draft on the concordance query page ==== | ||
+ | |||
+ | FIXME The subcorpus can also be created directly on the concordance query page under the Restrict search option. After checking the selected segments just click on the **Save as a subcorpus draft** option. To make a subcorpus active, you need to go to the menu **Corpora → My subcorpora**, find the subcorpus draft in the table and use the gear icon to open the subcorpus properties and then finalize it (see the following section). FIXME | ||
+ | ===== My subcorpora ===== | ||
+ | |||
+ | [{{ : | ||
+ | |||
+ | FIXME The section **Corpora → My Subcorpora** provides a list of all the subcorpora (nebo jejich připravených konceptů) defined by the user. Next to their name in the table is also their size (in the number of [[en: | ||
+ | |||
+ | - If the subcorpus is in the draft status, you can finalize its settings (modify its structure, or add its public description) and convert it to the active status by selecting **Finalize subcorpus** on the File bar. | ||
+ | - You can change the text selection of a given subcorpus using the **Subcorpus structure** bar only for the subcorpus draft. However, if the user changes the structure of an already created subcorpus, the settings can then be simply copied to a new subcorpus with the new name using the **Save as...** option. | ||
+ | - For each sub-corpus, the name can be changed as well as the public searchability by adding or deleting the description in the **Name and public description** bar. | ||
+ | - If the user no longer plans to actively work with the subcorpus, she can archive it (using the **Archive** button on the File bar). In this case, the subcorpus will be hidden in the My Subcorpora list, will not appear on the search pages, and will not be publicly searchable. However, the URLs created for the search results will still work. If necessary, an archived subcorpus can also be displayed in my subcorpus list at a later time (by checking **Show archived corpora as well**) and restored to its original state. | ||
+ | - Subcorpora can also be permanently deleted by clicking the **Delete** button on the File bar. In this case, all subcorpus data is already physically removed, and the existing URLs are no longer valid. This procedure is therefore more appropriate for subcorpora that have not yet been shared between users, or if there is a serious reason to remove them. | ||
+ | |||
+ | The list contains all of the user’s corpora. At the same time they can be filtered by individual original corpora. FIXME However, it must be repeated that subcorpora ale always tied to the default (original) corpus. Therefore, if we create a fiction subcorpus from the corpus [[en: | ||
- | [{{ : | + | ==== Using subcorpora |
- | The section | + | Searching in the created subcorpus can by initiated by one click in the subcorpus in the menu **Corpora → My subcorpora** or by selecting |
- | The list contains all of the user’s corpora. However, it must be repeated that subcorpora | + | ===== Public |
- | ===== Using subcorpora | + | FIXME The results of subcorpus searches can be made available to other users by simply posting a link (assuming the users have access to the source corpus from which the subcorpus is created). However, it is also possible to share subcorpora |
- | Searching in the created | + | Each subcorpus is assigned a unique key (e.g. '' |
---- | ---- |