Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Next revisionBoth sides next revision |
en:cnk:syn [2021/12/17 12:08] – [Corpus SYN] michalkren | en:cnk:syn [2021/12/17 12:08] – [Corpus SYN] michalkren |
---|
~~NOTOC~~ | ~~NOTOC~~ |
| |
====== Corpus SYN ====== | ====== SYN corpus ====== |
| |
The **SYN** is a non-reference corpus consisting of texts from all reference [[en:pojmy:synchronni| synchronic]] [[en:pojmy:psany|written]] corpora of the SYN series published up until the given version of the SYN corpus (for example [[en:cnk:syn:verze3|SYN version 3]] from the year 2014 includes the corpora [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2010|SYN2010]] and [[en:cnk:syn2013pub|SYN2013PUB]], as can be seen in the following table) and which has been processed by the newest versions of the ([[en:pojmy:token|tokenization]], [[en:pojmy:segmentace|segmentation]], [[en:pojmy:morfologicka_analyza|morphological analysis]] and [[en:pojmy:desambiguace|disambiguation]] tools). | **SYN** is a non-reference corpus consisting of texts from all reference [[en:pojmy:synchronni| synchronic]] [[en:pojmy:psany|written]] corpora of the SYN series published up until the given version of the SYN corpus (for example [[en:cnk:syn:verze3|SYN version 3]] from the year 2014 includes the corpora [[en:cnk:syn2000|SYN2000]], [[en:cnk:syn2005|SYN2005]], [[en:cnk:syn2006pub|SYN2006PUB]], [[en:cnk:syn2009pub|SYN2009PUB]], [[en:cnk:syn2010|SYN2010]] and [[en:cnk:syn2013pub|SYN2013PUB]], as can be seen in the following table) and which has been processed by the newest versions of the ([[en:pojmy:token|tokenization]], [[en:pojmy:segmentace|segmentation]], [[en:pojmy:morfologicka_analyza|morphological analysis]] and [[en:pojmy:desambiguace|disambiguation]] tools). |
| |
The SYN corpus is not representative, as the vast majority of the texts belongs to the category of newspapers and magazines, which is due to their easy accessibility. | The SYN corpus is not representative, as the vast majority of the texts belongs to the category of newspapers and magazines, which is due to their easy accessibility. |