AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:manualy:kwords [2023/04/06 13:03] – [Thematic concentration] vaclavcvrceken:manualy:kwords [2023/11/13 10:01] (current) – [KWords] vaclavcvrcek
Line 1: Line 1:
 ====== KWords ====== ====== KWords ======
  
-{{ :manualy:k-words_logo.png?nolink&200|}}+{{ :manualy:kwords_logo_v2.png?nolink&|}}
  
-The KWords application is used for the analysis of texts based on their comparison with the general usage ([[en:pojmy:referencni|reference]] corpus). Its aim is to identify so-called [[en:pojmy:keyword|keywords]], which are [[en:pojmy:word|word forms]] appearing in the inspected text with a significantly higher frequency than in the reference corpus which should reflect the common usage. These key words serve as a basis for textual analysis and interpretation.+The KWords application is used for the analysis of texts based on their comparison with the general usage ([[en:pojmy:referencni|reference]] corpus). Its aim is to identify so-called [[en:pojmy:keyword|keywords]], which are [[en:pojmy:word|word forms]] or [[en:pojmy:lemma|lemmas]] appearing in the inspected text with a significantly higher frequency than in the reference corpus which should reflect the common usage. These key words serve as a basis for textual analysis and interpretation.
  
 KWords is an online application (the only thing we need to use it is a web browser) and it is accessible without  [[en:kurz:zaciname|registration]] to all users at  **[[http://kwords.korpus.cz|kwords.korpus.cz]]**.  KWords is an online application (the only thing we need to use it is a web browser) and it is accessible without  [[en:kurz:zaciname|registration]] to all users at  **[[http://kwords.korpus.cz|kwords.korpus.cz]]**. 
  
-The KWords applcation was originally created for the purpose of analyzing political speeches, and is being developed further in cooperation with [[http://www.brown.edu|Brown University]]. It is currently implemented for the analysis of Czech and English texts of up to approx20 thousand words.+The first version of KWords was developed for the purpose of analyzing political speeches in collaboration with [[http://www.brown.edu|Brown University]]. The second version was developed as part of the [[https://threat-defuser.org|Threat-defuser project]]. This version supports more than 30 languages and allows keyword analysis as well as keymorph analysis.((see Fidler, M. - Cvrček, V.: [[https://doi.org/10.1515/cllt-2016-0073|Keymorph analysis, or how morphosyntax informs discourse]]. Corpus Linguistics and Linguistic Theory. 15/1, p39–70.))
  
 ===== Prominent units ===== ===== Prominent units =====
Line 20: Line 20:
 The identification of [[en:pojmy:keyword|keywords]] takes place based on a comparison of each word's relative [[en:pojmy:frekvence|frequency]] in the given text with the same word's relative frequency in the reference corpus. Several tests are used to determine the statistical significance of the differences, two of which are implemented in KWords: [[en:pojmy:chi2|chi2]] and [[en:pojmy:loglikelihood|log-likelihood]]. Keywords in the analyzed text are marked <fc #ff0000>red</fc> The identification of [[en:pojmy:keyword|keywords]] takes place based on a comparison of each word's relative [[en:pojmy:frekvence|frequency]] in the given text with the same word's relative frequency in the reference corpus. Several tests are used to determine the statistical significance of the differences, two of which are implemented in KWords: [[en:pojmy:chi2|chi2]] and [[en:pojmy:loglikelihood|log-likelihood]]. Keywords in the analyzed text are marked <fc #ff0000>red</fc>
  
-The results of the keyword analysis are always influenced by the choice of reference corpus, which should be seen as a neutral language background with which we compare the analyzed text. For example, when analyzing the New Year speeches of the last Communist president G. Husák, we notice that compared to current usage there is a high frequency of words such as  //socialistický// (socialistic), //soudružky// (comrades) etc., but this i not the case when compared to a reference corpus from the same period. Currently, the following reference corpora can be used in the KWords application: +The results of the keyword analysis are always influenced by the choice of reference corpus, which should be seen as a neutral language background with which we compare the analyzed text. For example, when analyzing the New Year speeches of the last Communist president G. Husák, we notice that compared to current usage there is a high frequency of words such as  //socialistický// (socialistic), //soudružky// (comrades) etc., but this i not the case when compared to a reference corpus from the same period. Currently, the [[en:cnk:intercorp|InterCorp]] parallel corpus is available for all languages as reference corpus.
-  * for Czech +
-    * [[en:cnk:syn2015|SYN2015]] +
-    * [[en:cnk:syn2010|SYN2010]] +
-    * [[en:cnk:syn2005|SYN2005]] +
-    * diakon19 -- ad hoc corpus created from available data in the [[en:cnk:struktura#diachronnikorpus|diachronic part of the CNC]] covering the 19th Century +
-    * totalita -- a corpus of ideological texts and official journalism from the period of Communist totalitarianism +
-    * Oral -- the [[en:cnk:oral2006|Oral2006]] and [[en:cnk:oral2008|Oral2008]] corpora +
-    * pub -- the journalistic section of the corpora [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]] and [[en:cnk:syn2010|SYN2010]] +
-    * bel -- the fiction section of the corpora [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]] and [[en:cnk:syn2010|SYN2010]] +
-    * odb -- specialized literature from the corpora [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]] and [[en:cnk:syn2010|SYN2010]] +
-  * for English +
-    * BNC -- [[http://www.natcorp.ox.ac.uk|British National Corpus]] +
-    * COCA -- [[http://www.wordfrequency.info/100k.asp|Corpus of Contemporary American English]] +
-    * InterCorp-EN v8 -- the English section of the parallel corpus [[en:cnk:intercorp|InterCorp]]+
 ==== Thematic concentration ==== ==== Thematic concentration ====
  
Line 57: Line 43:
 ===== Application images ===== ===== Application images =====
  
-[{{:kurz:kwords-vstup.png?direct&300|Inputting text into KWords}}] +{{:manualy:kwords2.png?direct&400 |}} 
-[{{:kurz:kwords-vystup.png?direct&300|Analyzed text with highlighted keywords}}] +{{:manualy:kwords2_nastaveni.png?direct&400 |}} 
-[{{:kurz:kwords-tab.png?direct&300|List of keywords}}] +{{:manualy:kwords2_klicova_slova.png?direct&400|}} 
-[{{:kurz:kwords-distrib.png?direct&300|Distribution of keywords throughout the analyzed text}}] +{{:manualy:kwords2_graf.png?direct&400 |}} 
-[{{:kurz:kwords-links.png?direct&300|Mutual relations between keywords (keyword links)}}] +{{:manualy:kwords2_distribuce.png?direct&400 |}} 
-[{{:kurz:kwords-comp.png?direct&300|Comparison of several speeches -- multi-analysis}}]+{{:manualy:kwords2_konkordance.png?direct&400 |}} 
 +{{:manualy:kwords2_links.png?direct&400|}}
  
 +===== Application images (previous version)=====
 +
 +[{{:kurz:kwords-vstup.png?direct&400 |Inputting text into KWords}}]
 +[{{:kurz:kwords-vystup.png?direct&400 |Analyzed text with highlighted keywords}}]
 +[{{:kurz:kwords-tab.png?direct&400|List of keywords}}]
 +[{{:kurz:kwords-distrib.png?direct&400 |Distribution of keywords throughout the analyzed text}}]
 +[{{:kurz:kwords-links.png?direct&400 |Mutual relations between keywords (keyword links)}}]
 +[{{:kurz:kwords-comp.png?direct&400|Comparison of several speeches -- multi-analysis}}]
  
 ==== Related links  ==== ==== Related links  ====