One of the principal properties of interface KonText is the possibility to use statistical methods to identify collocations of a wanted word. By collocation, we understand a meaningful, fixed, syntagmatic sequence of two (or more) words in the immediate proximity. A collocation consists of a key word (node which usually is also KWIC) and a contextual word (collocate). The list of collocation candidates with which a wanted word or a phrase collocates forms the basis for corpus analysis, as it enables us to determine what kind of context is typical for a wanted phenomenon.
Association measures are used to identify a collocation. Interface KonText presently employs the following basic ones: t-score, MI, MI3, log likelihood, min. sensitivity, logDice, MI.log_f, relative frequency. Each of the measures is sensitive to different kinds of phrases and each might not work in some cases. It is therefore recommended to combine the measures and compare their output. The statistical analysis by association measures generates a list of collocation candidates and it is up to the researcher to decide whether they really are legitimate collocations.
Suppose that we created a concordance of lemma dřevo in the corpus SYN2010. By clicking on the Collocation item in menu, a form for collocation analysis will appear. In the form it is possible to specify the following values when searching the collocations within the scope of created concordance:
Based on the submitted specifications, lemma dřevo co-occurs with 2386 different words (attribute word) which can function as its collocates. Sorting by logDice produces a list with the following forms as the most significant collocate candidates: tvrdého, bázi, kus, kusy, dubového…
The list comprises of both the overall frequency of co-occurrence of the wanted phenomenon and its collocate (e.g. of lemma dřevo and collocate tvrdého) and the values of selected association measures for that collocation. By clicking on the column header, the list will be rearranged according to the selected value. Just like in the list of frequency distribution, it is possible to create positive or negative filter with the link p/n in the collocate list which searches for the collocate in the proximity of the initial KWIC.
Two precautions must be mentioned here:
Menu: Query • Corpora • Save • concordance • Filter • Frequency • Collocation • View • Help