Lists: Frequency list browser

The Lists application allows the user to browse the frequency lists of various units (lemma, word and lc) in representative corpora of written Czech language (SYN2000, SYN2005, SYN2010, SYN2015) and in the corpus of spontaneous spoken Czech language Oral v1. For each written Czech corpus, the users can access not only the overall results, but also frequency information for the three sub-corpora (fiction, non-fiction, and journalistic texts). Frequency lists only contain units which are made up of alphabetic symbols and hyphens, and which have a frequency higher than zero in each of the written corpora (SYN2000, SYN2005, SYN2010, and SYN2015), or, in case of Oral, a non-zero frequency in this corpus.

When browsing the list by corpora (first tab), each unit has 4 types of frequency information:

  • absolute frequency,
  • relative frequency (IPM),
  • average reduced frequency (ARF),
  • average reduced frequency normalized per million words (ARFn).

The lemma table contains an additional column with word class information (POS). The data in the table may be ordered and filtered by any of the columns; filtering based on numerical data may also be achieved by writing a specific interval in the form of M … N (e.g. 10 … 99) into a field in the column’s header.

The second tab in the browser provides a simple comparison of relative frequencies (IPM) and average reduced frequencies normalized per million words (ARFn) within individual registers (other frequency-related data are dependent on the size of the sub-corpus, rendering the results of such a comparison worthless). The information in this tab is derived from the SYN2015 and Oral v1 corpora.

For the purposes of comparison and the use of the newest versions of lemmatization and POS tagging, the data for the SYN2000, SYN2005, SYN2010, and SYN2015 corpora have been taken from the corresponding sub-corpora of the SYN v7 corpus.

The application is available at http://www.korpus.cz/lists

In addition to the Lists application, CNC offers also other options for working with the frequency lists:

  • registered CNC users can create customized frequency lists using the Word list option of the KonText application,
  • it is possible to download comparative frequency lists (Czech only) directly from the web,
  • other frequency data can be obtained upon request sent by e-mail to cnk (at) korpus.cz

How to cite Lists

Křen, M. - Cvrček, V.: Lists: Frequency list browser. FF UK. Praha 2019. Available at: <http://www.korpus.cz/lists>.