Differences

This shows you the differences between two versions of the page.

--- en:manualy:kwords [2016/11/20 18:22] – [How it works] veronikapojarova
+++ en:manualy:kwords [2023/11/02 14:33] – jankocek
@@ Line 1: / Line 1: @@
 ====== KWords ======
-{{ kurz:kwords-logo.png?nolink&200|}}
+{{ :manualy:k-words_logo_V2.png?nolink&200|}}
 The KWords application is used for the analysis of texts based on their comparison with the general usage ([[en:pojmy:referencni|reference]] corpus). Its aim is to identify so-called [[en:pojmy:keyword|keywords]], which are [[en:pojmy:word|word forms]] appearing in the inspected text with a significantly higher frequency than in the reference corpus which should reflect the common usage. These key words serve as a basis for textual analysis and interpretation.
@@ Line 37: / Line 37: @@
 ==== Thematic concentration ====
-Words which are highlighted in <html><span style="background-color: yellow">yellow</span></html> in the analyzed text are those which bear thematic concentration (TC words). They are not identified through comparison with a reference corpus, but only by their placement in the frequency distribution of the units in the analyzed text: when we arrange all the words in the text from those which are most frequent and down to words which appear only once, we get a so-called [[en:pojmy:zipf|Zipf]] distribution. In this distribution we are looking for a so-called //h// point, for which we can say that rank = frequency (e.g. 32nd most frequent word has a frequency of 32 occurrences). All autosemantic words (bearing meaning independent of context) above this point (i.e. in our case with a frequency higher than 32) we label thematic concentration. More details and a specific application of this approach to literary texts can be found for example in the article of [[http://www.cechradek.cz/publ/2013_Davidova_Cech_Tematicka_koncentrace_Jehlicka_NR.pdf|R. Čech]] (2013).
+Words which are highlighted in yellow in the analyzed text are those which bear thematic concentration (TC words). They are not identified through comparison with a reference corpus, but only by their placement in the frequency distribution of the units in the analyzed text: when we arrange all the words in the text from those which are most frequent and down to words which appear only once, we get a so-called [[en:pojmy:zipf|Zipf]] distribution. In this distribution we are looking for a so-called //h// point, for which we can say that rank = frequency (e.g. 32nd most frequent word has a frequency of 32 occurrences). All autosemantic words (bearing meaning independent of context) above this point (i.e. in our case with a frequency higher than 32) we label thematic concentration. More details and a specific application of this approach to literary texts can be found for example in the article of [[http://www.cechradek.cz/publ/2013_Davidova_Cech_Tematicka_koncentrace_Jehlicka_NR.pdf|R. Čech]] (2013).
 ===== How it works =====
@@ Line 68: / Line 68: @@
 <WRAP round box 49%>
-[[en:manualy:kontext:index|KonText interface]] • [[syd|SyD]] • [[morfio|Morfio]] • [[treq|Treq]] • [[en:pojmy:korpusovy_manazer|Corpus manager]] • [[en:pojmy:nastroje|Corpus tools]]
+[[en:manualy:kontext:index|KonText interface]] • [[en:manualy:syd|SyD]] • [[en:manualy:morfio|Morfio]] • [[en:manualy:treq|Treq]] • [[en:pojmy:korpusovy_manazer|Corpus manager]] • [[en:pojmy:nastroje|Corpus tools]]
 </WRAP>

Trace:

Differences

Search

Navigation

Print/export

Tools

Languages

Licence