AplikaceAplikace
Nastavení

Multidimensional analysis of Czech

Multidimensional analysis (MDA) is a method developed by Douglas Biber 1) for the empirical research of textual variability. MDA is based on the assumption that textual variability is manifested in the utilization of linguistic features from different levels (from phonology and morphology through lexicon to syntax and pragmatics). When designing a text, the use of one set of features is often conditioned or supported by the use of another, which leads to the assumption that in order to describe variability, it is optimal to group features into dimensions based on how they co-occur in texts. The dimensions created in this way, i.e. based on the detected co-occurrence of linguistic features in texts, represent the basic characteristics along which the texts exhibit variation and on the basis of which individual registers can be defined.

The method for modelling register variability using MDA has become fixed over the years and consists of the following steps:

  • preparation of a corpus (see the Koditex corpus),
  • selection of features and their operationalization,
  • factor analysis,
  • interpretation of results.

Dimensions of Czech MDA

Based on empirical findings 2), it was established that the number of dimensions for Czech MDA would be 8. Each of the dimensions is characterized by several prominent linguistic features which are assigned either positive or negative values on a scale. For each of the dimensions, an all-encompassing name was chosen which reflects the interpretation of both extreme values.

Dimensions in Czech MDA:

  1. dynamic (+) vs. static (−)
  2. spontaneous (+) vs. prepared (–)
  3. higher (+) vs. lower (–) level of cohesion
  4. polythematic (+) vs. monothematic (–)
  5. higher (+) vs. lower (–) amount of addressee coding
  6. general (+) vs. particular (–)
  7. prospective (+) vs. retrospective (–)
  8. attitudinal (+) vs. factual (–)

Overview of prominent linguistic features

The numbers in the brackets signify loading, i.e. to the extent to which the presence of a given feature in the text contributes to its placement in the dimension.

Dimension Features (+) Features (–)
dynamic (+) vs. static (–) verbal tense – past (0.98)
verbs (0.96)
finite verbs (0.95)
indicative (0.95)
verbal aspect (perf.) (0.93)
3rd person pronouns (0.78)
incongruent nominal postmodifiers (-0.79)
adjectives (-0.78)
abstract nouns (-0.72)
congruent premodifiers (-0.72)
genitive (-0.72)
adjective clusters (-0.70)
spontaneous (+) vs. prepared (–) contact expressions (0.97)
filler words (0.85)
demonstrative pronouns (without “to”) (0.82)
interjections (0.82)
expressive particles (other – the rest from COH2, AMP and DOWN) (0.80)
pronoun non-dropping (0.79)
prepositional cases in general (-0.62)
clauses with interrogative and relative adverbs (-0.57)
prepositions (-0.56)
verbal aspect (perf.) (-0.49)
nominative + accusative (-0.46)
unigrams (zTTR) (-0.46)
higher (+) vs. lower (–) level of cohesion correlatives (0.59)
nominal predicate (0.53)
relative clauses of the “který” type (0.45)
possessive pronouns (0.44)
inventory of pronouns (0.44)
numerals (-0.43)
polythematic (+) vs. monothematic (–) bigrams (zTTR) (0.76)
unigrams (zTTR) (0.70)
toponyms (0.37)
thematic concentration (-0.61)
Yule's coefficient (-0.49)
verbal nouns (-0.45)
verbal voice (passive) (-0.42)
higher (+) vs. lower (–) amount of addressee coding questions (all) (0.69)
verbs in 2nd person (0.66)
wh-questions (0.63)
2nd person pronouns (0.62)
verbal tense – future (0.53)
average clause length in number of tokens (-0.36)
frequent ngrams (-0.30)
general (+) vs. particular (–) coordination (0.58)
semantically empty adjectives (0.41)
anthroponyms (-0.49)
numerals (-0.40)
temporal expressions (-0.36)
prospective (+) vs. retrospective (–) verbal tense – present (0.77)
verbal tense – future (0.55)
nominal predicate adj. (0.52)
imperative (0.42)
verb in 2nd person (0.40)
verbal tense – past (-0.74)
3rd person pronouns (-0.43)
possessive adjectives (-0.39)
relative clauses of the “jenž” type (-0.36)
attitudinal (+) vs. factual (–) particles weakening meaning (downtoners/hedges) (0.68)
restrictors (0.63)
particles strengthening meaning (amplifiers/boosters) (0.57)
particles structuring the text (0.52)
adverbs (0.50)
coordination (-0.33)
1)
Biber, D. (1988). Variation Across Speech and Writing. Cambridge, England: Cambridge University Press; Biber, D. (1995). Dimensions of Register Variation: A Cross-Linguistic Comparison. Cambridge, England: Cambridge University Press; Biber, D., & Conrad, S. (2009). Register, Genre, and Style. Cambridge, England: Cambridge University Press.
2)
Cvrček, V. – Komrsková, Z. – Lukeš, D. – Poukarová, P. – Řehořková, A. – Zasina, A. J. (2018): From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA. Corpus Linguistics and Linguistic Theory.