Differences

This shows you the differences between two versions of the page.

--- en:manualy:kontext:novy_dotaz [2023/07/27 11:02] – jankocek
+++ en:manualy:kontext:novy_dotaz [2024/02/12 15:58] (current) – jankocek
@@ Line 141: / Line 141: @@
   * whitelist -- a list of pre-selected words (in a separate file) which we want to see in the resulting list
   * blacklist -- a list of pre-selected words (in a separate file) which we want to exclude from the resulting list
 Among the output option settings we can find a selection of either the absolute [[en:pojmy:frekvence|frequency]], [[en:pojmy:arf|ARF]] or a document count. Furthermore there is also the possibility to choose a specific output attribute (or attributes). These attributes **need not be** identical to the positional attribute selected in the top section of the form, on which all the above mentioned filters are applied. This enables us to create e.g. a frequency list of all verbs by selecting the attribute [[en:pojmy:tag|tag]] in the top section, applying the condition for a verb as in [[en:seznamy:tagy#pozice_1_-_slovni_druh|V.*]] and finally by "switching" the output type to [[en:pojmy:lemma|lemma]] – an example of such a query is shown in the picture.
+===== Keyword analysis =====
+[{{ :en:manualy:kontext:analyza_k_slov_en.png?direct&400|List of keywords in the ORAL v1 corpus compared to the SYN2020 reference corpus}}]
+The KonText interface can generate an inventory of keywords, i.e., forms or lemmas appearing in the selected (sub)corpus significantly more often than in the reference (sub)corpus, reflecting common language usage. (The analysis of keywords in the users’ texts is enabled by [[en:manualy:kwords|the specialized KWords application]].)
+Besides the corpus in which we want to find the terms in question, we also have to specify a reference corpus (or a subcorpus, e.g. if we want to compare a corpus consisting mainly of journalistic texts, such as one of SYN corpora, with a subcorpus of fiction texts: SYN2020-BEL). Next, we specify by which positional attribute the terms should be searched, by which metric they should be sorted (Log-likelihood, Chi-square, or Difference index), and possibly we also specify the desired minimum or maximum frequency. Sought-after terms can further be filtered using a regular expression; the default .* expression will display all results (or, rather, the first 1000 occurrences).
+The resulting list of keywords in the form of a table is sorted according to the selected metric, with the remaining two also displayed, followed in the next columns by the absolute and relative frequency values in both corpora. The list of found keywords can be viewed in both corpora in the respective concordance through the positive filter (<fc #4682b4>p</fc> to the right of the absolute frequency value).
 ===== Recent queries =====

Trace:

Differences

Search

Navigation

Print/export

Tools

Languages

Licence