Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
en:cnk:codit [2021/03/24 23:34] – [How to cite] michalkren | en:cnk:codit [2021/03/29 14:18] (current) – [CODIT corpus] michalkren | ||
---|---|---|---|
Line 5: | Line 5: | ||
{{ : | {{ : | ||
- | The CODIT corpus is a balanced diachronic corpus of written Italian of around 33 million tokens; it covers a period ranging from the earliest attestations of the Italian language (i.e. the XIII century) to 1947. Its structure recalls that shown by the [[http:// | + | The CODIT corpus is a balanced diachronic corpus of written Italian of around 33 million tokens. The corpus has been compiled by [[https:// |
The corpus is structured into five subcorpora, depending on the chronological period. The periodization follows that adopted for the MIDIA corpus: it is based on important linguistic and social facts of the Italian history. Particularly, | The corpus is structured into five subcorpora, depending on the chronological period. The periodization follows that adopted for the MIDIA corpus: it is based on important linguistic and social facts of the Italian history. Particularly, | ||