AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:manualy:lists [2019/10/15 20:43] vaclavcvrceken:manualy:lists [2021/02/02 18:28] (current) michalkren
Line 1: Line 1:
 ====== Lists: Frequency list browser ====== ====== Lists: Frequency list browser ======
  
-The //Lists// application allows the user to browse the frequency lists of various units ([[en:pojmy:lemma|lemma]], [[en:pojmy:word|word]] and [[en:pojmy:lc|lc]]) in representative corpora of written Czech language ([[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2010|SYN2010]], [[en:cnk:syn2015|SYN2015]]) and in the corpus of spontaneous spoken Czech language [[en:cnk:oral|Oral v1]]. For each written Czech corpus, the users can access not only the overall results, but also frequency information for the three sub-corpora (fiction, scientific literature, and journalistic texts). Frequency lists only contain units which are made up of alphabetic symbols and hyphens, and which have a frequency higher than zero in each of the written corpora (SYN2000, SYN2005, SYN2010, and SYN2015).+The //Lists// application allows the user to browse the frequency lists of various units ([[en:pojmy:lemma|lemma]], [[en:pojmy:word|word]] and lc) in representative corpora of written Czech language ([[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2010|SYN2010]], [[en:cnk:syn2015|SYN2015]]) and in the corpus of spontaneous spoken Czech language [[en:cnk:oral|Oral v1]]. For each written Czech corpus, the users can access not only the overall results, but also frequency information for the three sub-corpora (fiction, non-fiction, and journalistic texts). Frequency lists only contain units which are made up of alphabetic symbols and hyphens, and which have a frequency higher than zero in each of the written corpora (SYN2000, SYN2005, SYN2010, and SYN2015), or, in case of Oral, a non-zero frequency in this corpus.
  
 When browsing the list by corpora (first tab), each unit has 4 types of frequency information: When browsing the list by corpora (first tab), each unit has 4 types of frequency information:
Line 10: Line 10:
   * average reduced frequency normalized per million words (ARFn).   * average reduced frequency normalized per million words (ARFn).
  
-The lemma table contains an additional column with word class information ([[en:pojmy:pos|POS]]). The data in the table may be ordered and filtered by any of the columns;  filtering based on numerical data may also be achieved by writing a specific interval in the form of M ... N (e.g. 10 ... 99) into a field in the column’s header.+The lemma table contains an additional column with word class information ([[en:pojmy:pos|POS]]). The data in the table may be ordered and filtered by any of the columns;  filtering based on numerical data may also be achieved by writing a specific interval in the form of ''M ... N'' (e.g. ''10 ... 99'') into a field in the column’s header.
  
 The second tab in the browser provides a simple comparison of relative frequencies (IPM) and average reduced frequencies normalized per million words (ARFn) within individual registers (other frequency-related data are dependent on the size of the sub-corpus, rendering the results of such a comparison worthless). The information in this tab is derived from the SYN2015 and Oral v1 corpora. The second tab in the browser provides a simple comparison of relative frequencies (IPM) and average reduced frequencies normalized per million words (ARFn) within individual registers (other frequency-related data are dependent on the size of the sub-corpus, rendering the results of such a comparison worthless). The information in this tab is derived from the SYN2015 and Oral v1 corpora.
Line 16: Line 16:
 For the purposes of comparison and the use of the newest versions of lemmatization and POS tagging, the data for the SYN2000, SYN2005, SYN2010, and SYN2015 corpora have been taken from the corresponding sub-corpora of the [[en:cnk:syn:verze7|SYN v7]] corpus. For the purposes of comparison and the use of the newest versions of lemmatization and POS tagging, the data for the SYN2000, SYN2005, SYN2010, and SYN2015 corpora have been taken from the corresponding sub-corpora of the [[en:cnk:syn:verze7|SYN v7]] corpus.
  
-The application is available at: [[https://jupyter.korpus.cz/lists/]]+**The application is available at [[http://www.korpus.cz/lists]]** 
 + 
 +In addition to the //Lists// application, CNC offers also other options for working with the frequency lists: 
 +  * registered CNC users can create customized frequency lists using the [[en:manualy:kontext:novy_dotaz#word_list|Word list]] option of the [[https://kontext.korpus.cz/|KonText]] application, 
 +  * it is possible to download [[seznamy:srovnavaci_seznamy|comparative frequency lists]] (Czech only) directly from the web, 
 +  * other frequency data can be obtained upon request sent by e-mail to cnk (at) korpus.cz 
 + 
 +===== How to cite Lists ===== 
 + 
 +<WRAP round tip 80%> 
 +Křen, M. - Cvrček, V.: Lists: Frequency list browser. FF UK. Praha 2019. Available at: <http://www.korpus.cz/lists>
 +</WRAP>