AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:codit [2021/03/22 13:56] Michal Křenen:cnk:codit [2021/03/29 14:18] (current) – [CODIT corpus] Michal Křen
Line 1: Line 1:
 ====== CODIT corpus ====== ====== CODIT corpus ======
  
-//Corpus diacronico dell’italiano// +//Corpus diacronico dell’italiano// -- ‘Diachronic corpus of Italian’
- +
-‘Diachronic corpus of Italian’+
  
 {{ :en:cnk:codit-logo.png?direct&180|}} {{ :en:cnk:codit-logo.png?direct&180|}}
  
-The CODIT corpus is a balanced diachronic corpus of written Italian of around 33 million tokensit covers a period ranging from the earliest attestations of the Italian language (i.e. the XIII century) to 1947. Its structure recalls that shown by the MIDIA corpus (//Morfologia Italiana in Diacronia// ‘Italian Morphology in Diachrony’, 7 million tokens).((The MIDIA corpus is available at this link: http://www.corpusmidia.unito.it/ (accessed: 15/01/2021).)) The corpus currently consists of raw (not annotated) texts, but POS annotation and lemmatization will be added.+The CODIT corpus is a balanced diachronic corpus of written Italian of around 33 million tokens. The corpus has been compiled by [[https://www.unimib.it/maria-silvia-micheli|Maria Silvia Micheli]] and it covers a period ranging from the earliest attestations of Italian language (i.e. the 13th century) to 1947. Its structure recalls that shown by the [[http://www.corpusmidia.unito.it/|MIDIA]] corpus (//Morfologia Italiana in Diacronia// ‘Italian Morphology in Diachrony’, 7.5 million tokens). The corpus currently consists of raw (not annotated) texts, but POS annotation and lemmatization will be added.
 The corpus is structured into five subcorpora, depending on the chronological period. The periodization follows that adopted for the MIDIA corpus: it is based on important linguistic and social facts of the Italian history. Particularly, the five subcorpora are the following: The corpus is structured into five subcorpora, depending on the chronological period. The periodization follows that adopted for the MIDIA corpus: it is based on important linguistic and social facts of the Italian history. Particularly, the five subcorpora are the following:
  
- +  - XIII century -- 1375: this subcorpus represents a period ranging between the earliest attestations of the Italian language and the Boccaccio’s death. 
- +  - 1376 -- 1532: this subcorpus represents a period encompassing Humanism and Renaissance. It ends in 1532 with the publication of the third edition of the //Orlando furioso// by Ludovico Ariosto. 
-  - XIII century-1375: this subcorpus represents a period ranging between the earliest attestations of the Italian language and the Boccaccio’s death. +  - 1533 -- 1691: this subcorpus represents the literary Mannerism and Baroque. It ends in 1691 with the publication of the third edition of the //Vocabolario// by the Accademia della Crusca. 
-  - 1376-1532: this subcorpus represents a period encompassing Humanism and Renaissance. It ends in 1532 with the publication of the third edition of the //Orlando furioso// by Ludovico Ariosto. +  - 1692 -- 1840: this subcorpus encompasses the Enlightenment and Romantic period. It ends in 1840 with the publication of the final edition of the //Promessi Sposi// by Alessandro Manzoni.  
-  - 1533-1691: this subcorpus represents the literary Mannerism and Baroque. It ends in 1691 with the publication of the third edition of the //Vocabolario// by the Accademia della Crusca. +  - 1841 -- 1947: this subcorpus represents a period ranging from the Risorgimento to the end of the Second World War. It ends in 1947 with the publication of the Italian Constitution.
-  - 1692-1840: this subcorpus encompasses the Enlightenment and Romantic period. It ends in 1840 with the publication of the final edition of the //Promessi Sposi// by Alessandro Manzoni.  +
-  - 1841-1947: this subcorpus represents a period ranging from the Risorgimento to the end of the Second World War. It ends in 1947 with the publication of the Italian Constitution.+
  
  
Line 29: Line 25:
 ^ scientifici|  0|  593,168|  716,098|  824,532|  742,856| ^ scientifici|  0|  593,168|  716,098|  824,532|  742,856|
 ^ teatro|  79,213|  478,787|  545,645|  541,389|  546,750| ^ teatro|  79,213|  478,787|  545,645|  541,389|  546,750|
-TOT|  4,511,873|  6,589,372|  7,496,501|  6,996,485|  7,502,574|+TOTAL|  4,511,873|  6,589,372|  7,496,501|  6,996,485|  7,502,574|
  
-**Table 1**. CODIT: structure and size+**Table 1**: CODIT structure and size
  
-===== How to cite =====+===== How to cite CODIT =====
  
 <WRAP round tip 70%> <WRAP round tip 70%>