Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:cnk:online:gen1 [2022/12/22 12:35] – vaclavcvrcek | en:cnk:online:gen1 [2022/12/22 14:13] (current) – [ONLINE1 (1st generation)] vaclavcvrcek | ||
---|---|---|---|
Line 2: | Line 2: | ||
====== ONLINE1 (1st generation) ====== | ====== ONLINE1 (1st generation) ====== | ||
- | ONLINE_NOW and ONLINE_ARCHIVE are two corpora which together create a monitor | + | Monitor |
- | + | ||
- | Both corpora differ in their extent and periodicity of updates: | + | |
- | * **ONLINE_NOW** -- contains daily updates from the current month plus 6 preceding months; | + | |
- | * **ONLINE_ARCHIVE** -- contains data since Feb 2017 until the date when ONLINE_NOW begins; updated every month | + | |
<WRAP right 35%> | <WRAP right 35%> | ||
- | ^ <fs medium> | + | ^ <fs medium> |
- | ^ Size (as of Nov 2020) ^ Number of [[en: | + | ^ Size ^ Number of tokens | |
- | ^ ::: ^ Number of sentences <s> | | + | ^ ::: ^ Number of sentences <s> | |
- | ^ Additional information ^ [[en: | + | ^ Additional information ^ Reference | NO | |
- | ^ ::: ^ [[en: | + | ^ ::: ^ Representative | NO | |
+ | ^ ::: ^ Period covered | 1/2017 – 3/2021 | | ||
^ ::: ^ Year of publication | 2020 | | ^ ::: ^ Year of publication | 2020 | | ||
</ | </ | ||
- | The ONLINE_NOW and ONLINE_ARCHIVE corpora are disjunctive, | ||
- | ==== Updates ==== | ||
- | The key feature of the ONLINE corpora are regular updates. This means that their contents **change continually**, | + | ===== Corpus |
- | Updates of the ONLINE_NOW corpus take place **daily around 9:00 (CET)**, when the data from the previous day is added and published. The amount of the updates varies (depending on the size of the downloaded material) from 4 to 8 million tokens. On the first day of every month, the oldest month of the ONLINE_NOW corpus is moved to ONLINE_ARCHIVE. | + | Compared to the [[en:cnk: |
- | Updates of the ONLINE_ARCHIVE corpus thus takes place **every month**, when there is a whole month removed from ONLINE_NOW and added to ONLINE_ARCHIVE (it is always the month that us actually a half year old). | ||
- | |||
- | <fs smaller> | ||
- | For instance, on Aug 25, ONLINE_NOW contains data from Feb 1 until Aug 24 (inclusive), | ||
- | </ | ||
- | |||
- | |||
- | ===== Corpus structure ===== | ||
- | |||
- | Compared to the [[en: | ||
* **news** -- internet news | * **news** -- internet news | ||
* **facebook** -- posts, including comments (the collection of facebook data is discontinued since December 2020) | * **facebook** -- posts, including comments (the collection of facebook data is discontinued since December 2020) | ||
Line 90: | Line 75: | ||
<WRAP round tip 70%> | <WRAP round tip 70%> | ||
- | Cvrček, V. – Procházka, P.: //ONLINE_NOW: monitorovací korpus internetové češtiny//. Ústav Českého národního korpusu FF UK, Praha 2020 [cit. YYYY-MM-DD((Concrete day in the year-month-day format, e.g. 2020-10-02.))]. Available from: http:// | + | Cvrček, V. – Procházka, P.: //ONLINE1: monitoring corpus of online Czech//. Ústav Českého národního korpusu FF UK, Praha 2020. Available from: http:// |
- | + | ||
- | Cvrček, V. – Procházka, P.: // | + | |
</ | </ | ||