This is an old revision of the document!
NET Corpus
NET corpus is the first version of a synchronic corpus of Czech semi-official internet communication. It is currently composed of two parts, discussion forums and blogs. The data coverage of NET will increase in the future. As one of the aims of NET is to map the selected areas of internet communication, NET tries to capture the selected domain from its beginning, and at the same time, NET will concentrate also on its future content that will be included in future version of the corpus, so that NET could capture its change over time.
Discussion forums
This part of the corpus is concentrated on discussion forums run on the phpBB platform. For the time being, there are neither comments and discussions to the given article or social network data included in NET. The sampling of the phpBB platform forums has been random, the sample size is planned to be increased in the future.
Personal blogs
Personal blogs have been downloaded mostly from news servers and web magazines where they often form a supplementary part of the main web. The selection of downloaded blogs was based on their number of visits. There are no corporate or other formal blogs included in the NET corpus.