Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:manualy:kontext:konkordance [2023/05/17 16:28] – jankocek | en:manualy:kontext:konkordance [2024/11/05 11:49] (current) – [Navigation with concordance editing sequence] michalskrabal |
---|
| |
[{{:en:manualy:kontext:konkordance-liska.png?direct&450 |Concordance list}}] | [{{:en:manualy:kontext:konkordance-liska.png?direct&450 |Concordance list}}] |
The basic type of query evaluation is [[en:pojmy:konkordance|concordance]], in particular, the concordance list. It is a list of all verbs or phrases which fit the ([[en:pojmy:kwic|KWIC]]) query, along with their right and left contexts, possibly also with information about the source text. A long concordance list is usually divided into several pages, and it is possible to switch between them with the help of the arrows placed in the header and footer of the concordance list. Display parameter settings (the number of rows on a page, length of context etc.) can be changed with the [[en:manualy:kontext:moznosti_zobrazeni|View]] option. | The basic kind of query is [[en:pojmy:konkordance|concordance]], in particular, a concordance list. It is a list of all words or phrases which fit the ([[en:pojmy:kwic|KWIC]]) query, along with their right and left contexts, possibly also with information about the source text. A long concordance list is usually divided into several pages, and it is possible to switch between them with the help of the arrows placed in the header and footer of the concordance list. Display parameter settings (the number of rows on a page, length of context etc.) can be changed using the [[en:manualy:kontext:moznosti_zobrazeni|View]] menu. |
| |
| The following sections describe the functionality of the page, basic work with a concordance list, sorting, shuffling the list and creating a sample. Further work with the concordance (i.e. ([[en:manualy:kontext:filtr|filtering]], [[en:manualy:kontext:frekvence|frequency]], and [[[[en:manualy:kontext:kolokace|collocation]] analysis) is covered on separate pages corresponding to other menu items. |
| |
The following sections describe the functionality of the page, basic work with a concordance list, sorting, shuffling the list and creating a sample. Further work with the concordance, i.e. ([[en:manualy:kontext:filtr|filtering]], [[en:manualy:kontext:frekvence|frequency]], and [[[[en:manualy:kontext:kolokace|collocation]] analysis), is devoted to separate pages corresponding to other menu items. | |
===== Navigation with concordance editing sequence ===== | ===== Navigation with concordance editing sequence ===== |
| |
Under the KonText logo, there is navigation that provides basic information about the searched corpus (item **Corpus**) and allows to modify the initial query (item **Query**) as well as to change any subsequent modifications above the concordance list (shuffling and other items corresponding to the modifications). | Under the KonText logo, a navigation overview provides basic information about the searched corpus (item **Corpus**) and allows modifying the initial query (item **Query**), as well as changing any subsequent actions performed on the concordance list (e.g. shuffling, filters etc.). |
| |
| |
\\ | \\ |
\\ | \\ |
The text of the link in the Query item always shows the query entered by the user in a simplified way, followed by the number of occurrences in parentheses. The query can be displayed (including the selected corpus and all conditions) and edited using the link, i.e. its parameters can be changed directly above the searched concordance (without the need to enter a new query). This feature is especially useful if you want to refine or update the original query. When editing a query, you must also decide whether the following operations should be performed on the concordance. They include a useful shuffle option (this can be run once from the menu **Concordance → Shuffle** or set as default in the menu **View → General view options**, see the section below for details), so it is usually a good idea to leave the option **Perform automatically also subsequent operations** checked. | The text of the link in the **Query** item always shows a simplified representation of the query entered by the user, followed by the number of occurrences in parentheses. The query can be displayed and edited using the link, i.e. its parameters can be changed directly above the search results (without the need to enter a new query). This feature is especially useful if you want to refine or update the original query. When editing a query, you must also decide whether the following operations should be performed on the concordance. These include a useful shuffle option (this can be switched on directly in the query form or run later from the menu **Concordance → Shuffle**), so it is usually a good idea to leave the option **Perform automatically also subsequent operations** checked. |
| |
The following line items represent the steps in the edit sequence that change the form and extent of the concordance. These can be handled in a similar way to the initial query. | The subsequent navigation items represent the steps in the edit sequence that change the form and extent of the concordance. These can be manipulated in a similar way as the initial query. |
| |
[{{:en:manualy:kontext:popis-dotazu_en.png?direct&600|Description of a more complex query }}] | [{{:en:manualy:kontext:popis-dotazu_en.png?direct&600|Description of a more complex query }}] |
| |
| |
The query overview with the sequence of all concordance modifications is also available in a tabular form by clicking the last link **Details**. | The query overview with the sequence of all concordance modifications is also available in tabular form by clicking the last link, **Details**. |
| |
Besides editing the query, the user can also return to the individual steps and intermediate phases, renewing the form of the concordance from any of the previous steps. E.g. we can easily return to the default (and subsequently shuffled) concordance list using the **View** link on the second line with the shuffle operation. | Besides editing the query, the user can also return to the form of the concordance from any of the previous steps and intermediate phases. E.g. we can easily return to the default concordance list (including shuffling) using the **View** link in the second row with the shuffle operation. |
| |
Query overview thus allows for an exact query specification for future use, e.g. in a research report, scientific paper etc. It is a more complex and more general variant of the simple //Undo// button in a web browser. While the Query overview allows you to browse and edit operations on a concordance, the Undo button allows you to browse (but only in a fixed order) both the concordance and, say, the frequency or collocation analysis results. | The query overview thus allows for an exact query specification for future use, e.g. in a research report, scientific paper etc. It is a more complex and more general variant of the web browser's simple built-in //Undo// button. While the Query overview allows you to browse and edit operations on a concordance, the Undo button allows you to browse (but only in a fixed order) both the concordance and, say, the frequency or collocation analysis results. |
| |
===== Working with concordance ===== | ===== Working with concordances ===== |
| |
==== Introduction ==== | ==== Introduction ==== |
| |
The concordance list is divided into several parts. Highlighted in the centre is the [[en:pojmy:kwic|KWIC]], which is surrounded by the left and right context. In the left-hand column each row contains some brief information about the source text (its content depends on the settings in the [[en:manualy:kontext:moznosti_zobrazeni|View]]) menu. In the header of the concordance we can find basic information such as absolute [[en:pojmy:frekvence|frequency]], relative frequency ([[en:pojmy:ipm|i.p.m.]]), [[en:pojmy:arf|ARF]] and concordance status (whether it is [[en:manualy:kontext:konkordance#trideni|sorted]] or [[en:manualy:kontext:konkordance#promichat|shuffled]]). Arrows for browsing the individual pages of the concordance are placed on the far right. | The concordance list is divided into several parts. Highlighted in the centre is the [[en:pojmy:kwic|KWIC]], which is surrounded by the left and right context. In the left-hand column, each row contains some brief information about the source text (its content depends on the settings in the [[en:manualy:kontext:moznosti_zobrazeni|View]]) menu. In the header of the concordance, we can find basic information such as absolute [[en:pojmy:frekvence|frequency]], relative frequency ([[en:pojmy:ipm|i.p.m.]]), [[en:pojmy:arf|ARF]] and concordance status (whether it is [[en:manualy:kontext:konkordance#sorting|sorted]] or [[en:manualy:kontext:konkordance#shuffle|shuffled]]). Arrows for browsing the individual pages of the concordance are placed on the far right. |
| |
==== Further information about the text ==== | ==== Further information about the text ==== |
| |
More detailed **information about the text** from which the specific concordance line originates is displayed after clicking on the metainformation listed in colour on the left of each line. The detailed metainformation then appears in the window at the bottom of the concordance list, which contains all of the information about the given text and the structures in which the KWIC is found (see [[en:seznamy:index|lists of abbreviations used]]). | More detailed **information about the text** from which the specific concordance line originates is displayed after clicking on the metainformation listed in colour on the left of each line. The detailed metainformation then appears in a window at the bottom of the concordance list, which contains all of the information about the given text and the structures in which the KWIC is found (see [[en:seznamy:index|lists of abbreviations used]]). |
| |
[{{:en:manualy:kontext:konkordance_rozsirenikontextu.png?direct&700|Concordance with wider context}}] | [{{:en:manualy:kontext:konkordance_rozsirenikontextu.png?direct&700|Concordance with wider context}}] |
\\ | \\ |
| |
==== Text surroundings of the KWIC ==== | ==== Display context of the KWIC ==== |
| |
The text surroundings of the key word can be widened either for all concordance lines (see menu [[en:manualy:kontext:moznosti_zobrazeni#obecne_volby_zobrazeni_konkordance|View → General view options]], function //KWIC Context size (positions)//) or it is possible to view the **wider context** of only one concordance line in more detail in a separate window which appears after we click on KWIC. Here it is possible (to a limited degree) to widen the context with the help of the blue arrows at the beginning and at the end of the sample. | The context of the key word can be widened either for all concordance lines (see menu [[en:manualy:kontext:moznosti_zobrazeni#obecne_volby_zobrazeni_konkordance|View → General view options]], option //KWIC Context size (positions)//), or it is possible to view the **wider context** of only one concordance line in more detail in a separate window which appears after clicking on the KWIC. Here it is possible (to a limited degree) to widen the context with the help of the blue arrows at the beginning and at the end of the sample. |
| |
In **newer written corpora** (starting with [[[en:cnk:syn2020|SYN2020]] and [[en:cnk:syn:verze9|SYN verze9]]) it is possible to switch between two window display options: a default view corresponding to the [[en:pojmy:token|tokenization]] of the corpus, or the formatted text view preserving to some extent also the typographic form of the source text (this can be used, for example, to display poetry more clearly). | In **newer written corpora** (starting with [[[en:cnk:syn2020|SYN2020]] and [[en:cnk:syn:verze9|SYN verze9]]), it is possible to switch between two text display options: a default view corresponding to the [[en:pojmy:token|tokenization]] of the corpus, or the formatted text view preserving to some extent also the typographic form of the source text (this can be used, for example, to display poetry more clearly). |
| |
[{{:en:manualy:kontext:dialog_p_en.png?&direct&450 |Expanding the context in spoken corpora and its representation by utterances FIXME}}] | [{{:en:manualy:kontext:dialog_p_en.png?&direct&450 |Expanded context in a spoken corpus, visualized as speaker segments}}] |
In **spoken corpora**, switching between two display options -- linear display (as in written corpora) or display by individual speakers’ utterances -- is possible too. Obviously, the Speech view makes it easier to navigate through spoken language transcripts. | In **spoken corpora**, switching between two display options -- linear display (as in written corpora) or display by individual speakers’ segments -- is possible too. Obviously, the Speech view makes it easier to navigate through spoken language transcripts. |
| |
==== Syntactic graph ==== | ==== Syntactic graph ==== |
| |
If the corpus is [[en:pojmy:syntakticka_analyza|tagged syntatically]] (e.g. [[en:cnk:syn2015|SYN2015]]), there is an icon {{:manualy:kontext:syntax-tree-icon.png?nolink&20|}} between the checkbox and the text meta-information, used to show the **syntactic graph** of the given sentence. | If the corpus is [[en:pojmy:syntakticka_analyza|syntatically tagged]] (e.g. [[en:cnk:syn2015|SYN2015]]), clicking the {{:manualy:kontext:syntax-tree-icon.png?nolink&20|}} icon between the checkbox and the text meta-information will show the **syntactic graph** of the given sentence. |
| |
==== Manual labelling of concordance lines ==== | ==== Manual labelling of concordance lines ==== |
| |
At the far left of every line we can also find a selection box for **manual labelling** of the individual concordance lines. | At the far left of every line, we can also find a selection box for **manual labelling** of the individual concordance lines. |
| |
[{{ :en:manualy:kontext:podily_skupin.png?450|Portions of labelled groups of lines}}] | [{{ :en:manualy:kontext:podily_skupin.png?450|Portions of labelled groups of lines}}] |
| |
* **basic**: selection of specific concordance lines with the option of either selecting or not selecting each specific item. | * **basic**: selection of specific concordance lines with the option of either selecting or not selecting each specific item. |
* **groups**: a general classification of concordance lines into groups (e.g. according to meaning) labelled by numbers which the user selects | * **groups**: a general classification of concordance lines into groups (e.g. according to meaning) labelled by numbers which the user picks |
| |
During the labelling in both modes a link with the continuously updated number of selected lines appears in the header of the concordance list. The two modes differ in the options of further work with the labelled lines which are available after clicking on the link: while the basic selection only leads to the deletion or preservation of the given lines in the concordance, the options of working with groups are much wider. In this case, however, the first step required by Kontext is to save the given group classification, which is necessary for maintaining persistence– after saving the URL changes so that it remains as an unambiguous indicator of the input query and also the classification. Following this, it is possible to view some basic statistics of the given groups, sorting concordance according to them, or also return to further editing of the group classification. Switching to the page with the first selected line at any time is convenient for large, multi-page concordances. | During labelling in both modes, a link with the continuously updated number of selected lines appears in the header of the concordance list. The two modes differ in the options of further work with the labelled lines which are available after clicking on the link: while the basic selection only leads to the deletion or preservation of the given lines in the concordance, the options of working with groups are much wider. In this case, however, the first step required by KonText is to save the given group classification, which is necessary for maintaining persistence – after saving the URL changes so that it remains as an unambiguous indicator of the input query and also the classification. Following this, it is possible to view some basic statistics of the given groups, sort the concordance according to them, or also return to further editing of the group classification. Switching to the page with the first selected line at any time is convenient for large, multi-page concordances. |
| |
===== Functions of menu items ===== | ===== Items in the //Concordance// menu ===== |
| |
==== Sorting ==== | ==== Sorting ==== |
- Multilevel sorting | - Multilevel sorting |
| |
For the **simple sorting** we select the criterion based on which we are doing the sorting (we can choose from any [[en:pojmy:atributy_pozicni|positional]] or [[en:pojmy:atributy_strukturni|structural]] attribute) and the range of the sorting (whether we are sorting the [[pojmy:kwic|KWIC]], the right or left context). As a result we can have alphabetically sorted concordances e.g. according to the first preceding word, according to the form of the keyword, or according to the [[en:pojmy:txtype|text type]]. | For **simple sorting**, we select the criterion based on which we are doing the sorting (we can choose from any [[en:pojmy:atributy_pozicni|positional]] or [[en:pojmy:atributy_strukturni|structural]] attribute) and the range of the sorting (whether we are sorting the [[pojmy:kwic|KWIC]], the right or left context). As a result, we can have alphabetically sorted concordances e.g. according to the first preceding word, according to the form of the keyword, or according to the [[en:pojmy:txtype|text type]]. |
| |
The option **Number of tokens to sort** determines the range of the context or KWIC (if it is a multi-word one) on which the sorting mechanism will focus. If we select the value 2 for the right context, the results will be sorted alphabetically according to the first and second words following the KWIC. | The option **Number of tokens to sort** determines the range of the context or KWIC (if it is a multi-word one) on which the sorting mechanism will focus. If we select the value 2 for the right context, the results will be sorted alphabetically according to the first and second words following the KWIC. |
| |
The options **Ignore case** and **Backward** are applied to both the simple and the multilevel sort. The option Ignore case determines whether lower-/uppercase ([[wp>Case_sensitivity|case-sensitive]]) will be distinguished during the sorting or whether they will count as one and the same symbol ([[wp>Case_sensitivity|case-insensitive]]). The second option enables traditional alphabetical sorting (unmarked) or retrograde sorting, i.e. alphabetical order according to the back of a word (not according to its beginning, as is usual). | The options **Ignore case** and **Backward** are applied to both the simple and the multilevel sort. The option Ignore case determines whether lower-/uppercase will be distinguished during the sorting ([[wp>Case_sensitivity|case-sensitive]]) or whether they will count as one and the same symbol ([[wp>Case_sensitivity|case-insensitive]]). The second option enables traditional alphabetical sorting (unmarked) or retrograde sorting, i.e. alphabetical ordering starting from the end of words (instead of their beginning, as is usual). |
| |
**Multilevel sort** allows for the combination of all possible types of sorting into a hierarchy of maximally three levels. It is therefore possible to sort a concordance on the 1st level according to text type, and concordances with the same text type are then sorted on the 2nd level (e.g. according to the first right-hand context word) and on the 3rd level according to the keyword itself. | **Multilevel sort** allows for the combination of all possible types of sorting into a hierarchy of maximally three levels. It is therefore possible to sort a concordance on the 1st level according to text type, and concordances with the same text type are then sorted on the 2nd level (e.g. according to the first right-hand context word) and on the 3rd level according to the keyword itself. |
==== Shuffle ==== | ==== Shuffle ==== |
| |
In the default settings, the concordance is ordered according to the order in which the search results (individual concordance lines) are found in the corpus (e.g. in the corpus [[en:cnk:syn2015|SYN2015]] the first texts are fiction, then non-fiction and finally journalistic). This has the advantage of making it quicker to find matching rows. Nonetheless, if the concordance is extensive, and one needs to acquire a representative sample (e.g. for manual analysis), it is preferable to work with randomly shuffled lines. This can be done with the **Concordance → Shuffle** option. The resulting shuffle of the concordance lines is random yet repeatable. | In the default settings, the concordance is ordered according to the order in which the search results (individual concordance lines) are found in the corpus (e.g. in the corpus [[en:cnk:syn2015|SYN2015]] the first texts are fiction, then non-fiction and finally journalistic). This has the advantage of quickly displaying the first page of results, while more matches are being retrieved in the background. Nonetheless, if the concordance is extensive, and one needs to acquire a representative sample (e.g. for manual analysis), it is preferable to work with randomly shuffled lines. The resulting shuffle of the concordance lines is random yet repeatable. |
| |
Using the Shuffle option automatically is recommended, as it ensures that every concordance, before being displayed, is randomized in this way. The permanent setting of concordance lines shuffling is achievable in the **View → General view options** menu (the //Shuffle concordance lines by default// option). Such an approach functions as an effective prevention against drawing incorrect conclusions from studying a sample of results which originate from an unrepresentative set of texts. | Shuffling the concordance lines can be achieved in two ways: |
| |
If concordance lines shuffling is enabled by default, the Shuffle option will perform another random rearrangement. For each concordance, an explicit shuffling algorithm causes the results after the first, second, third... nth shuffle to match on repeated attempts on the same query. This guarantees the repeatability of experiments on corpora even when using a shuffled concordance. | - Before starting the search, by selecting the **Shuffle concordance lines** option on the **Query → Concordance** page directly in the search form next to the **Search** button. |
| - After evaluating the query, by selecting **Concordance → Shuffle**. |
| |
| Using the Shuffle option by default is recommended, as it ensures that every concordance, before being displayed, is randomized in this way. (After the first switch, the option remains activated for the following queries.) Such an approach functions as an effective prevention against drawing incorrect conclusions from studying a sample of results which originate from an unrepresentative set of texts. |
| |
| <wrap lo>Note: It is recommended to disable this option only temporarily in cases where a large concordance is expected, typically when searching for a frequent phenomenon in a corpus of billions of tokens. Particularly for these larger results, one should consider that turning on shuffling the lines can significantly increase the time spent waiting for the concordance to be displayed.</wrap> |
| |
| If the concordance lines have already been shuffled, the **Concordance → Shuffle** option will perform another random rearrangement. For each concordance, a reproducible shuffling algorithm causes the results after the first, second, third... n-th shuffle to match on repeated attempts with the same query. This guarantees the repeatability of experiments on corpora even when using a shuffled concordance. |
| |
==== Sample==== | ==== Sample==== |
| |
An alternative to the shuffle, especially when working with an extensive concordance, is the creation of random samples (**Concordance → Sample**). The main advantage of this approach is the fact that an extensive concordance can be randomly reduced to an extent that it will be within the user's powers to analyze it. When we limit the concordance range in this way, it naturally also influences the absolute frequency of the results. However, if the sample is large enough, the relative frequency (i.e. the proportions between the studied phenomena) should remain preserved. | An alternative to the shuffle, especially when working with an extensive concordance, is the creation of random samples (**Concordance → Sample**). The main advantage of this approach is the fact that an extensive concordance can be randomly reduced to an extent that will be within the user's powers to analyze. When we limit the concordance range in this way, it naturally also influences the absolute frequency of the results. However, if the sample is large enough, the relative frequency (i.e. the proportions between the studied phenomena) should remain preserved. |
| |
| |
==== Permanent link ==== | ==== Permanent link ==== |
**[[en:manualy:kontext:index|Menu]]**: [[en:manualy:kontext:novy_dotaz|Query]] • [[en:manualy:kontext:korpusy|Corpora]] • [[en:manualy:kontext:ulozit|Save]] • [[en:manualy:kontext:konkordance|Concordance]] • [[en:manualy:kontext:filtr|Filter]] • [[en:manualy:kontext:frekvence|Frequency]] • [[[[en:manualy:kontext:kolokace|Collocation]] • [[en:manualy:kontext:zobrazeni|View]] • [[en:manualy:kontext:napoveda|Help]] | **[[en:manualy:kontext:index|Menu]]**: [[en:manualy:kontext:novy_dotaz|Query]] • [[en:manualy:kontext:korpusy|Corpora]] • [[en:manualy:kontext:ulozit|Save]] • [[en:manualy:kontext:konkordance|Concordance]] • [[en:manualy:kontext:filtr|Filter]] • [[en:manualy:kontext:frekvence|Frequency]] • [[[[en:manualy:kontext:kolokace|Collocation]] • [[en:manualy:kontext:zobrazeni|View]] • [[en:manualy:kontext:napoveda|Help]] |
</WRAP> | </WRAP> |
| |
| |
| |