This is an old revision of the document!
OnomOs Corpus
The OnomOs Corpus is a linguistically processed database of texts from the periodicals Rudé právo (published 1920-1995) and Právo (1995-present). It contains one issue from each decade in which (Rudé) Právo was published. The corpus included texts dominated by the linguistic component; thus, for example, advertisements and classifieds, cinema, theatre and radio programmes, some types of texts from the sports section (e.g. scoreboards and player rosters), comic strips and crossword puzzles were excluded. The composition of the corpus is presented in more detail in Figure 1. In total, the corpus contains 255 149 tokens.
Obrázek č. 1 – struktura korpusu OnomOs (v tokenech)