AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:klasifikace_textu_syn2015 [2020/08/25 13:52] – [Volnočasová publicistika (LEI)] Veronika Pojarováen:cnk:klasifikace_textu_syn2015 [2020/08/25 14:17] (current) – [Overall classification] Veronika Pojarová
Line 28: Line 28:
 The most significant changes compared to the previous SYN series classification: The most significant changes compared to the previous SYN series classification:
  
-  * **Non-fiction** (previously scientific) **literature** (NFC) reflects a certain level of „proficiency“ and specialization of the target audience, and consists of three main types (''txtype''): scientific (SCI), professional (PRO) and popular (POP) literature. This macrogroup should be understood as the opposite of fiction and journalistic texts: for this reason, it also contains administrative texts (ADM) in the broadest sense as well as a group of texts that are on the borderline between fiction and non-fiction, most typically memoirs and autobiographies (MEM). By changing the name of this group from //scientific// to the more general //non-fiction// we hope to achieve a more accurate representation of its heterogeneous contents, while the term //scientific// is now assigned only to academic texts (SCI). The newly defined category of professional literature (PRO) includes texts which are characterized by large quantities of practical information primarily intended for professionals in a given field.+  * **Non-fiction** (previously scientific) **literature** (NFC) reflects a certain level of „proficiency“ and specialization of the target audience, and consists of three main types (''txtype''): scientific (SCI), professional (PRO) and popular (POP) literature. This macro-group should be understood as the opposite of fiction and journalistic texts: for this reason, it also contains administrative texts (ADM) in the broadest sense as well as a group of texts that are on the borderline between fiction and non-fiction, most typically memoirs and autobiographies (MEM). By changing the name of this group from //scientific// to the more general //non-fiction// we hope to achieve a more accurate representation of its heterogeneous contents, while the term //scientific// is now assigned only to academic texts (SCI). The newly defined category of professional literature (PRO) includes texts which are characterized by large quantities of practical information primarily intended for professionals in a given field.
   * Non-fiction literature newly contains an additional level for the SCI, PRO and POP txtype – ''genre_group'', which was created by grouping together individual disciplines or fields into larger categories and makes it possible to analyze texts from similar or related fields together: humanities (HUM), social sciences (SSC), natural sciences (NAT) and technical sciences (FTS).    * Non-fiction literature newly contains an additional level for the SCI, PRO and POP txtype – ''genre_group'', which was created by grouping together individual disciplines or fields into larger categories and makes it possible to analyze texts from similar or related fields together: humanities (HUM), social sciences (SSC), natural sciences (NAT) and technical sciences (FTS). 
   * On the ''genre'' level, which contains the most detailed classification and reflects each specific field or discipline, the individual texts were classified in a way that would most accurately correspond with the subject categorization used by the [[http://text.nkp.cz/o-knihovne/odborne-cinnosti/zpracovani-fondu/vecne-zpracovani-vecne-autority/material-kon2|National Library of the Czech Republic]]. The fields are featured in detail in the table below.   * On the ''genre'' level, which contains the most detailed classification and reflects each specific field or discipline, the individual texts were classified in a way that would most accurately correspond with the subject categorization used by the [[http://text.nkp.cz/o-knihovne/odborne-cinnosti/zpracovani-fondu/vecne-zpracovani-vecne-autority/material-kon2|National Library of the Czech Republic]]. The fields are featured in detail in the table below.
Line 46: Line 46:
  
 ^ HUM: humanities ^ SSC: social sciences ^ NAT: natural sciences ^ FTS: formal and technical sciences ^ ITD: interdisciplinary ^ ^ HUM: humanities ^ SSC: social sciences ^ NAT: natural sciences ^ FTS: formal and technical sciences ^ ITD: interdisciplinary ^
-| ANT: anthropology, etnography\\ THE: theatre, film, dance\\ PHI: philosophy, religion\\ HIS: history\\ LAN: philology\\ INF: library and information science\\ ART: art, architecture | ECO: economy, business, logistics\\ POL: politics, military\\ LAW: law\\ PSY: psychology\\ SOC: sociology\\ REC: sports, recreation, hobbies\\ EDU: education | BIO: biology \\ PHY: physics\\ GEO: geography, geology\\ CHE: chemistry\\ MED: medicine\\ AGR: agriculture | MAT: mathematics\\ TEC: technology\\ ICT: information technology | ITD: interdisciplinary |+| ANT: anthropology, ethnography\\ THE: theatre, film, dance\\ PHI: philosophy, religion\\ HIS: history\\ LAN: philology\\ INF: library and information science\\ ART: art, architecture | ECO: economy, business, logistics\\ POL: politics, military\\ LAW: law\\ PSY: psychology\\ SOC: sociology\\ REC: sports, recreation, hobbies\\ EDU: education | BIO: biology \\ PHY: physics\\ GEO: geography, geology\\ CHE: chemistry\\ MED: medicine\\ AGR: agriculture | MAT: mathematics\\ TEC: technology\\ ICT: information technology | ITD: interdisciplinary |
  
 ===== 3. Newspapers and magazines ===== ===== 3. Newspapers and magazines =====
Line 96: Line 96:
   * front page   * front page
  
-===== Souhrnná klasifikace =====+===== Overall classification =====
  
-Tabulka shrnuje klasifikaci textu do skupin podle atributů ''txtype_group'', ''txtype'', ''genre_group'' ''genre''.+The following table offers a comprehensive summary of how texts are divided into categories based on the ''txtype_group'', ''txtype'', ''genre_group'' and ''genre'' attributes.
  
 ^  txtype_group  ^  txtype  ^  genre_group  ^  genre  ^ ^  txtype_group  ^  txtype  ^  genre_group  ^  genre  ^
Line 106: Line 106:
 | ::: | SCR: drama, screenplays | ::: | ::: | | ::: | SCR: drama, screenplays | ::: | ::: |
 | ::: | X: other | ::: | ::: | | ::: | X: other | ::: | ::: |
-| NFC: non-fiction literature | SCI: scientific literature\\ PRO: professional literature\\ POP: popular literature | HUM: humanities | ANT: anthropology, etnography|+| NFC: non-fiction literature | SCI: scientific literature\\ PRO: professional literature\\ POP: popular literature | HUM: humanities | ANT: anthropology, ethnography|
 | ::: | ::: | ::: | THE: theatre, film, dance | | ::: | ::: | ::: | THE: theatre, film, dance |
 | ::: | ::: | ::: | PHI: philosophy, religion | | ::: | ::: | ::: | PHI: philosophy, religion |
Line 142: Line 142:
 | ::: | ::: | ::: | MIX: society| | ::: | ::: | ::: | MIX: society|
  
-Klasifikace textů je v SYN2015 doplněna o jejich další charakteristikyKaždý text má nově atribut [[seznamy:med|médium]], nabývající jednu z následujících hodnot:  +The classification of texts in SYN2015 is supplemented by some of their other characteristicsEach text newly has the [[en:seznamy:med|medium]] attributewhich assigns to it one of the following values:  
-  * B: kniha  +  * B: book  
-  * J: časopis +  * J: journal 
-  * NWS: noviny  +  * NWS: newspaper  
-  * OTH: jiná tiskovina +  * OTH: other printed medium 
-  * REF: referenční příručka +  * REF: reference handbook 
-  * TXB: učební materiál +  * TXB: textbook 
  
-[{{ :cnk:syn2015-periodicita.png?direct&250|Podíl periodik a neperiodik v SYN2015.}}]+[{{ :cnk:syn2015-periodicita.png?direct&250|The share of journals vs. non-journals in the SYN2015 corpus.}}]
  
-Dále vznikla i zcela nová kategorie udávající [[seznamy:periodicity|periodicitu]] daného titulu, která nabývá těchto hodnot+In addition, we have created a new attribute which identifies the [[en:seznamy:periodicity|periodicity]] of the given publication and can have one of the following values
-  * BI: nižší než měsíčník +  * BI: less than monthly 
-  * DA: deník +  * DA: daily 
-  * MO: měsíčník +  * MO: monthly 
-  * NP: neperiodická publikace +  * NP: non-periodical publication 
-  * WE: týdeníkčtrnáctideník+  * WE: weeklyfortnightly
  
-V atributu [[seznamy:audience|audience]] je uvedena informace o **věku předpokládaného čtenáře** texturozlišujeme texty určené pro obecné publikum (GEN) a dětem a mládeži (JUN). +In the [[en:seznamy:audience|audience]] attribute you can find information about the **age of the text's intended reader**: we differentiate among texts written for the general public (GEN) and children and adolescents (JUN). 
  
-Nově lze také u každého textu dohledat **pohlaví autora** ([[seznamy:authsex-transsex|authsex]]), případně **překladatele** ([[seznamy:authsex-transsex|transsex]]): žena (F), muž (M), neuvedeno (X).+Each text also newly contains information about the **author's sex** ([[en:seznamy:authsex-transsex|authsex]]), or the **translator's sex** ([[en:seznamy:authsex-transsex|transsex]]): female (F), male (M), not specified (X).
  
-Stejně jako v předešlých korpusech patří mezi metainformace o textu samozřejmě název díla (''title'')autor (''author'')překladatel (''translator'')rok vydání (''pubyear''), rok prvního vydání (''first_published''), zdrojový jazyk (''[[seznamy:srclang|srclang]]''a další charakteristiky.+Of course, the metainformation available in previous corpora is also available here, namely ''title'', ''author'', ''translator'', year of publication (''pubyear''), year of first publication (''first_published''), source language (''[[en:seznamy:srclang|srclang]]''and other characteristics.
 ===== The share of text types in the corpus===== ===== The share of text types in the corpus=====