Syntactic Complexity

InterCorp release 16ud is annotated by several measures of syntactic complexity. They are specified as metadata for each sentence and each text, for each linguistically annotated language. In KonText, they can be displayed and queried like any other metadata items, such as author or sentence ID.

Measures for sentences

maxNPLength: number of words in the longest noun phrase
maxNPDepth: number of embeddings in the noun phrase with the longest chain of embeddings
sLength: sentence length = no. of words in the sentence (punctuation excluded)
subRatio: subordination ratio = (no. of T-units + no. of clauses) / no. of T-units¹⁾
maxTreeDepth: maximum number of clause embeddings (coordination does not count)
mdd: mean dependency distance: average number of word boundaries between words and their heads

Measures for texts

The following measures are average values based on the measures for sentences. The mdd value is counted as the average for all words in the text.

maxNPLengthAvg: average number of words in the longest noun phrase
maxNPDepthAvg: average number of embeddings in the noun phrase with the longest chain of embeddings
sLengthAvg: average sentence length = no. of words in the sentence (punctuation excluded)
subRatioAvg: average subordination ratio = (no. of T-units + no. of clauses) / no. of T-units
maxTreeDepthAvg: average maximum number of clause embeddings (coordination does not count)
mdd: mean dependency distance: average number of word boundaries between words and their heads

In addition to syntactic complexity measures each text of sufficient length includes also two measures of lexical diversity.

¹⁾

T-unit is a main clause including all its embedded/dependent clauses. Each top-level clausal conjunct, including any embedded/dependent clauses, counts as a T-unit.

Trace: • syntakticka_komplexita

Table of Contents

Syntactic Complexity

Measures for sentences

Measures for texts

Search

Navigation

Print/export

Tools

Languages

Licence