Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
en:cnk:syn2020 [2020/12/27 12:15] – [How to cite SYN2020] michalkren | en:cnk:syn2020 [2021/01/21 16:49] – [Multiple lemmatization and tagging (aggregate)] michalkren | ||
---|---|---|---|
Line 7: | Line 7: | ||
</ | </ | ||
- | <WRAP right 35%> | + | <WRAP right 45%> |
^ <fs medium> | ^ <fs medium> | ||
^ Positions ^ Number of positions (tokens) | 121 826 797 | | ^ Positions ^ Number of positions (tokens) | 121 826 797 | | ||
Line 74: | Line 74: | ||
* the boundaries for the synchrony of newspapers and magazines remains unchanged, i.e. the text must have been published in the period which is being mapped by the corpus (in the case of SYN2020 it is the period between 2015 and 2019). | * the boundaries for the synchrony of newspapers and magazines remains unchanged, i.e. the text must have been published in the period which is being mapped by the corpus (in the case of SYN2020 it is the period between 2015 and 2019). | ||
- | ===== Annotation of SYN2020: changes | + | ===== Annotation of SYN2020: changes |
==== Tokenization ==== | ==== Tokenization ==== | ||
Line 108: | Line 108: | ||
==== Multiple lemmatization and tagging (aggregate) ==== | ==== Multiple lemmatization and tagging (aggregate) ==== | ||
- | In the SYN2020 corpus, **multiple lemmas and tags** for a special group of words, so-called **aggregates**, | + | In the SYN2020 corpus, **multiple lemmas and tags** for a special group of words, so-called **aggregates** |
+ | |||
+ | ==== Automatic corpus annotation ==== | ||
+ | For SYN2020, the entire annotation process is automatic. Its detailed description including the annotation accuracy and a rich bibliography to both the tools and data can be found on a [[cnk: | ||
====== How to cite SYN2020 ====== | ====== How to cite SYN2020 ====== |