Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
en:pojmy:lemma [2022/04/19 20:37]
Michal Škrabal [Sublemma]
en:pojmy:lemma [2022/04/20 14:07] (current)
David Lukeš [Sublemma]
Line 12: Line 12:
 ===== Sublemma ===== ===== Sublemma =====
  
-Starting with the SYN2020 corpus, there is two-level lemmatization in Czech corpora: each form is given the sublemma attribute in addition to the lemma attribute. While a lemma may include multiple variants of a single word (e.g. the lemma //filozof// represents all forms with both //filozof// and //filosof// stems), sublemmata delimit subgroups of forms according to this alternation (the sublemma //filozof// represents only forms with the stem //filozof//, while the sublemma //filosof// represents only forms with the stem //filosof//). If the word has no variants, the sublemma is identical to the lemma (e.g. the lemma //kniha// represents the same set of forms as the sublemma //kniha//).+Starting with the SYN2020 corpus, Czech corpora feature two-level lemmatization: each form is given sublemma attribute in addition to the lemma attribute. While a lemma may include multiple variants of a single word (e.g. the lemma //filozof// represents all forms with both //filozof// and //filosof// stems), sublemmas delimit subgroups of forms according to this alternation (the sublemma //filozof// represents only forms with the stem //filozof//, while the sublemma //filosof// represents only forms with the stem //filosof//). If the word has no variants, the sublemma is identical to the lemma (e.g. the lemma //kniha// represents the same set of forms as the sublemma //kniha//).
  
-Different types of variants are handled as sublemmata (e.g. //mýdlo/mejdlo//, //okno/vokno//, //citron/citrón//, //email/e-mail//, //myslet/myslit//, //mýt/mejt//, //péci/péct/píct//, //kuchyně/kuchyň//, //antivirus/antivir//, //sedm/sedum//, //tenhle/tendle/tenle//, //ačkoli/ačkoliv//, proper names //Robert/Róbert/Roberto//, //Atény/Athény//). Sublemmata are aldo used to distinguish some specific groups of forms that are included under one lemma (e.g. negated forms of adjectives and adverbs //černý/nečerný//, //hezky/nehezky//, short forms of adjectives //mladý/mlád//, suppletive forms //dobře/lépe/líp//, //člověk/lidé//).+Different types of variants are handled as sublemmas (e.g. //mýdlo/mejdlo//, //okno/vokno//, //citron/citrón//, //email/e-mail//, //myslet/myslit//, //mýt/mejt//, //péci/péct/píct//, //kuchyně/kuchyň//, //antivirus/antivir//, //sedm/sedum//, //tenhle/tendle/tenle//, //ačkoli/ačkoliv//, proper names //Robert/Róbert/Roberto//, //Atény/Athény//). Sublemmas are also used to distinguish some specific groups of forms that are subsumed under one lemma (e.g. negated forms of adjectives and adverbs //černý/nečerný//, //hezky/nehezky//, short forms of adjectives //mladý/mlád//, suppletive forms //dobře/lépe/líp//, //člověk/lidé//).
  
 ===== The link between a lemma and lexeme ===== ===== The link between a lemma and lexeme =====