Differences

This shows you the differences between two versions of the page.

--- en:manualy:kontext:kolokace [2016/09/12 16:35] – [Collocation list] jankocek
+++ en:manualy:kontext:kolokace [2026/06/08 17:44] (current) – [Menu: Collocation] michalkren
@@ Line 3: / Line 3: @@
 [{{ :en:manualy:kontext:kolokace-form.png?direct&300|Form for specification of analysis of collocation candidates }}]
-One of the principal properties of [[en:manualy:kontext:index|interface KonText]] is the possibility to use statistical methods to identify [[en:pojmy:kolokace|collocations]] of a wanted word. By collocation, we understand a meaningful, fixed, syntagmatic sequence of two (or more) words in the immediate proximity. A collocation consists of a key word (**node** which  usually is also [[en:pojmy:kwic|KWIC]]) and a contextual word (**collocate**). The  list of collocation candidates with which a wanted word or a phrase collocates forms the basis for corpus analysis, as it enables us to determine what kind of context is typical for a wanted phenomenon.
+One of the principal properties of [[en:manualy:kontext:index|interface KonText]] is the possibility to use statistical methods to identify [[wp>Collocation|collocations]] of a wanted word. By collocation, we understand a meaningful, fixed, syntagmatic sequence of two (or more) words in the immediate proximity. A collocation consists of a key word (**node** which  usually is also [[en:pojmy:kwic|KWIC]]) and a contextual word (**collocate**). The  list of collocation candidates with which a wanted word or a phrase collocates forms the basis for corpus analysis, as it enables us to determine what kind of context is typical for a wanted phenomenon.
 [[en:pojmy:asociacni_miry|Association measures]] are used to identify a collocation. [[en:manualy:kontext:index|Interface KonText]] presently employs the following basic ones: t-score, MI, MI3, log likelihood, min. sensitivity, logDice, MI.log_f, relative frequency. Each of the measures is sensitive to different kinds of phrases and each might not work in some cases. It is therefore recommended to combine the measures and compare their output. The statistical analysis by association measures generates a list of collocation **candidates** and it is up to the researcher to decide whether they really are legitimate collocations.
+Note: because of Manatee limitations, **structures are ignored when computing collocations**, i. e. key and contextual words may be in different sentences. When the same word occurs multiple times in one context span, it is counted only once.
 Suppose that we [[en:manualy:kontext:novy_dotaz|created a concordance]] of lemma //dřevo// in the corpus [[en:cnk:syn2010|SYN2010]]. By clicking on the Collocation item in menu, a form for collocation analysis will appear. In the form it is possible to specify the following values when searching the collocations within the scope of created concordance:
@@ Line 13: / Line 15: @@
   - ** In the range from - to**: specification of the contextual span (in the proximity of [[en:pojmy:kwic|KWIC]]) where the collocates will be searched for (negative numbers indicate the positions preceding KWIC, while the positive ones follow KWIC, cf. [[en:manualy:kontext:frekvencni_distribuce#frekvencni_distribuce_podle_pozicnich_atributu|frequency distribution]]))
   - **Minimum frequency in corpus**: establishes minimum overall frequency of a unit in order to be included in the collocate list (provided that the minimum frequency is set on 5, the collocate of lemma //dřevo// cannot be those items that occur in the whole corpus less than 5 times)
-  - **Minimum frequency in given range**: provided that we specified the context span for collocate search from -3 to 3, then the minimum frequency in given range optiom determines how frequently should an item co-occur with KWIC to be included in the collocate list (when calculating the association measures only those items will be taken into consideration which occur at least 3 times in the proximity of KWIC, lemma //dřevo// in our examle)
+  - **Minimum frequency in given range**: provided that we specified the context span for collocate search from -3 to 3, then the minimum frequency in given range option determines how frequently should an item co-occur with KWIC to be included in the collocate list (when calculating the association measures only those items will be taken into consideration which occur at least 3 times in the proximity of KWIC, lemma //dřevo// in our example)
   - **Show functions**: which association measures will be calculated and listed for each of the collocates that  the conditions specified above are met
   - **Sort by**: according to which of the association measures will the list be sorted (especially useful for the long lists)
@@ Line 35: / Line 37: @@
 <WRAP center round box 48%>
-**[[en:manualy:kontext:index|Menu]]**: [[en:manualy:kontext:novy_dotaz|Query]] • [[en:manualy:kontext:subkorpus|Corpora]] • [[en:manualy:kontext:ulozit|Save]] • [[en:manualy:kontext:konkordance|concordance]] • [[en:manualy:kontext:filtr|Filter]] • [[en:manualy:kontext:frekvencni_distribuce|Frequency]] • [[[[en:manualy:kontext:kolokace|Collocation]] • [[en:manualy:kontext:moznosti_zobrazeni|View]] • [[en:manualy:kontext:napoveda|Help]]
+**[[en:manualy:kontext:index|Menu]]**: [[en:manualy:kontext:novy_dotaz|Query]] • [[en:manualy:kontext:korpusy|Corpora]] • [[en:manualy:kontext:ulozit|Save]] • [[en:manualy:kontext:konkordance|concordance]] • [[en:manualy:kontext:filtr|Filter]] • [[en:manualy:kontext:frekvence|Frequency]] • [[[[en:manualy:kontext:kolokace|Collocation]] • [[en:manualy:kontext:zobrazeni|View]] • [[en:manualy:kontext:napoveda|Help]]
 </WRAP>

Trace:

Differences

Search

Navigation

Print/export

Tools

Languages

Licence