AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:pojmy:syntakticka_komplexita [2024/09/30 18:17] – [Measures for texts] alexandrrosenen:pojmy:syntakticka_komplexita [2024/10/18 20:39] (current) – [References] alexandrrosen
Line 1: Line 1:
 ====== Syntactic Complexity ====== ====== Syntactic Complexity ======
  
-InterCorp release 16ud is annotated by several measures of syntactic complexity. They are specified as metadata for each sentence and each text, for each linguistically annotated language. In KonText, they can be displayed and queried like any other metadata items, such as author or sentence ID. +InterCorp release 16ud is annotated by several measures of syntactic complexity. They are specified as metadata for each sentence and each text, for each linguistically annotated language. In KonText, they can be displayed and queried like any other metadata items, such as text author or sentence ID. 
  
 In addition to syntactic complexity measures each text of sufficient length includes also two measures of **[[en:pojmy:lexikalni_bohatost|lexical diversity]]**.  In addition to syntactic complexity measures each text of sufficient length includes also two measures of **[[en:pojmy:lexikalni_bohatost|lexical diversity]]**. 
  
 ===== Measures for sentences =====  ===== Measures for sentences ===== 
 +
 +Also see below for [[https://wiki.korpus.cz/doku.php/en:pojmy:syntakticka_komplexita#how_the_measures_are_calculated_general_rules|general rules on calculating the measures]].  
  
   * **maxNPLength**: number of words in the longest noun phrase   * **maxNPLength**: number of words in the longest noun phrase
     * Punctuation is ignored.     * Punctuation is ignored.
-    * For the definition of noun phrase see **What counts as a noun phrase** below.+    * For the definition of noun phrase see [[https://wiki.korpus.cz/doku.php/en:pojmy:syntakticka_komplexita#what_counts_as_a_noun_phrase|What counts as a noun phrase]] below.
   * **maxNPDepth**: for a noun phrase with the longest chain of embeddings: the number of such embeddings   * **maxNPDepth**: for a noun phrase with the longest chain of embeddings: the number of such embeddings
     * For bare head the measure equals 0.     * For bare head the measure equals 0.
-    * Function words (such as determiners or postpositions) introduce an additional level of embedding.+    * Function words (such as determiners or prepositions) introduce an additional level of embedding.
     * Punctuation is ignored.      * Punctuation is ignored. 
     * Coordination does not introduce an additional level of embedding.     * Coordination does not introduce an additional level of embedding.
-    * For the definition of noun phrase see **What counts as a noun phrase** below.+    * For the definition of noun phrase see [[https://wiki.korpus.cz/doku.php/en:pojmy:syntakticka_komplexita#what_counts_as_a_noun_phrase|What counts as a noun phrase]] below.
   * **sLength**: sentence length in the number of words   * **sLength**: sentence length in the number of words
     * Punctuation is ignored.     * Punctuation is ignored.
   * **subRatio**: subordination ratio = (no. of T-units + no. of subordinate clauses) / no. of T-units   * **subRatio**: subordination ratio = (no. of T-units + no. of subordinate clauses) / no. of T-units
     * T-unit is a main clause including all its embedded/dependent clauses. Each top-level clausal conjunct, including any embedded/dependent clauses, counts as a T-unit.     * T-unit is a main clause including all its embedded/dependent clauses. Each top-level clausal conjunct, including any embedded/dependent clauses, counts as a T-unit.
-    * Constituents other then clauses are ignored. Clauses are defined as subtrees headed by a node with one of the following ''deprel''s: ''csubj'', ''ccomp'', ''xcomp'', ''advcl'' or ''acl'' (see **What counts as a clause** below).+    * Constituents other then clauses are ignored. Clauses are defined as subtrees headed by a node with one of the following ''deprel''s: ''csubj'', ''ccomp'', ''xcomp'', ''advcl'' or ''acl'' (see [[https://wiki.korpus.cz/doku.php/en:pojmy:syntakticka_komplexita#what_counts_as_a_clause|What counts as a clause]] below).
     * Function words (such as auxiliaries or conjunctions) are ignored.      * Function words (such as auxiliaries or conjunctions) are ignored. 
   * **maxTreeDepth**: for a clause with the longest chain of embedded clauses: the number of such embeddings   * **maxTreeDepth**: for a clause with the longest chain of embedded clauses: the number of such embeddings
     * For bare head the measure equals 0.     * For bare head the measure equals 0.
-    * Constituents other then clauses are ignored. Clauses are defined as subtrees headed by a node with one of the following ''deprel''s: ''csubj'', ''ccomp'', ''xcomp'', ''advcl'' or ''acl'' (see **What counts as a clause** below).+    * Constituents other then clauses are ignored. Clauses are defined as subtrees headed by a node with one of the following ''deprel''s: ''csubj'', ''ccomp'', ''xcomp'', ''advcl'' or ''acl'' (see [[https://wiki.korpus.cz/doku.php/en:pojmy:syntakticka_komplexita#what_counts_as_a_clause|What counts as a clause]]).
     * Coordination does not introduce an additional level of embedding.     * Coordination does not introduce an additional level of embedding.
     * Function words (such as auxiliaries or conjunctions) are ignored.      * Function words (such as auxiliaries or conjunctions) are ignored. 
   * **mdd**: mean dependency distance: average number of word boundaries between words and their heads   * **mdd**: mean dependency distance: average number of word boundaries between words and their heads
-    * punctuation is ignored+    * Punctuation is ignored
  
-N.B.: See below for general rules on calculating the measures.   
  
 ===== Measures for texts =====  ===== Measures for texts ===== 
  
-The following measures are average values based on the measures for sentences. The mdd value is counted as the average for all words in the text.+The following measures are average values based on the measures for sentences. The **mdd** value is counted as the average for all words in the text. Average values for all combinations of a language and a text type in InterCorp v16ud are shown in the table  [[https://wiki.korpus.cz/doku.php/en:cnk:intercorp:verze16ud#detailed_statistics|Detailed statistics]].
  
   * **maxNPLengthAvg**: average number of words in the longest noun phrase   * **maxNPLengthAvg**: average number of words in the longest noun phrase
Line 121: Line 123:
 Jagaiah, T., Olinghouse, N.G. & Kearns, D.M. (2020). Syntactic complexity measures: variation by genre, grade-level, students’ writing abilities, and writing quality. //Read Writ// **33**, 2577–2638 (2020). [[https://doi.org/10.1007/s11145-020-10057-x]] Jagaiah, T., Olinghouse, N.G. & Kearns, D.M. (2020). Syntactic complexity measures: variation by genre, grade-level, students’ writing abilities, and writing quality. //Read Writ// **33**, 2577–2638 (2020). [[https://doi.org/10.1007/s11145-020-10057-x]]
  
-[[https://docs.google.com/document/d/1nSPzyhT6oHKUDN8A_uYmWrZH6tAmxTH_pUMOdjg01Eg/edit?usp=sharing|InterCorp a Universal Dependenciesnové možnosti výzkumu]] (workshop 20a 273. 2024 v rámci Teoreticko-metodologického semináře Ústavu českého jazyka a teorie komunikace)+Rosen, A. (2024): Lexical and syntactic variability 
 +of languages and text genres – a corpus-based study. [[https://www.youtube.com/watch?v=E2ujmqt7Q2E|Recording]] from 14 October 2024: [[https://zil.ipipan.waw.pl/|Natural Language Processing Seminar]] organised by the [[https://zil.ipipan.waw.pl|Linguistic Engineering Group]] at the [[https://ipipan.waw.pl|Institute of Computer Science]] [[https://pan.pl|Polish Academy of Sciences]], [[https://zil.ipipan.waw.pl/seminarium-archiwum?action=AttachFile&do=view&target=2024-10-14.pdf|slides]]. 
  
-[[https://drive.google.com/file/d/1L9yTjj0bTrGgf8lDcOAsJoJOoeYEoPEm/view?usp=sharing|Exploring InterCorp v16ud: the potential of a multilingual parallel treebank with complexity and diversity metrics]] (slides from the seminar at the University of Warsaw, 10 July 2024) +Rosen, A. (2024). [[https://drive.google.com/file/d/1L9yTjj0bTrGgf8lDcOAsJoJOoeYEoPEm/view?usp=sharing|Exploring InterCorp v16ud: the potential of a multilingual parallel treebank with complexity and diversity metrics]] (slides from the seminar at the University of Warsaw, 10 July 2024)