Both sides previous revisionPrevious revisionNext revision | Previous revision |
en:pojmy:lemma [2022/04/13 14:51] – jankrivan | en:pojmy:lemma [2022/04/20 14:07] (current) – [Sublemma] lukes |
---|
The lemma as a unit originates from an abstraction of a [[en:pojmy:word|word form's]] morphological characteristics, and represents a set of forms which have the same root and differ only on their respective morphological affixes or orthographic form. In some approaches, the selected morphological variants are also associated with the lemma. | The lemma as a unit originates from an abstraction of a [[en:pojmy:word|word form's]] morphological characteristics, and represents a set of forms which have the same root and differ only on their respective morphological affixes or orthographic form. In some approaches, the selected morphological variants are also associated with the lemma. |
| |
====== Sublemma ====== | ===== Sublemma ===== |
| |
Starting with the SYN2020 corpus, lemmatization in Czech corpora is two-tiered: each form is given a sublemma attribute in addition to the lemma attribute. While a lemma can associate multiple variants of a single word (e.g. the lemma //filozof// represents all forms with both //filozof// and //filosof// stems), sublemmata delimit subgroups of forms according to this alternation (the sublemma //filozof// represents only forms with the stem //filozof//, the sublemma //filosof// represents only forms with the stem //filosof//). If the word is non-variant, the sublemma is identical to the lemma (e.g. a lemma //kniha// represents the same set of forms as a sublemma //kniha//). | Starting with the SYN2020 corpus, Czech corpora feature two-level lemmatization: each form is given a sublemma attribute in addition to the lemma attribute. While a lemma may include multiple variants of a single word (e.g. the lemma //filozof// represents all forms with both //filozof// and //filosof// stems), sublemmas delimit subgroups of forms according to this alternation (the sublemma //filozof// represents only forms with the stem //filozof//, while the sublemma //filosof// represents only forms with the stem //filosof//). If the word has no variants, the sublemma is identical to the lemma (e.g. the lemma //kniha// represents the same set of forms as the sublemma //kniha//). |
| |
Different types of variants are handled as sublemmata (e.g. //mýdlo/mejdlo//, //okno/vokno//, //citron/citrón//, //email/e-mail//, //myslet/myslit//, //mýt/mejt//, //péci/péct/píct//, //kuchyně/kuchyň//, //antivirus/antivir//, //sedm/sedum//, //tenhle/tendle/tenle//, //ačkoli/ačkoliv//, proper names //Robert/Róbert/Roberto//, //Atény/Athény//) and they are used to differentiate some specific groups of forms that are included under one lemma (e.g. negated forms of adjectives and adverbs //černý/nečerný//, //hezky/nehezky//, nominal forms of adjectives //mladý/mlád//, suppletion //dobře/lépe/líp//, //člověk/lidé//). | Different types of variants are handled as sublemmas (e.g. //mýdlo/mejdlo//, //okno/vokno//, //citron/citrón//, //email/e-mail//, //myslet/myslit//, //mýt/mejt//, //péci/péct/píct//, //kuchyně/kuchyň//, //antivirus/antivir//, //sedm/sedum//, //tenhle/tendle/tenle//, //ačkoli/ačkoliv//, proper names //Robert/Róbert/Roberto//, //Atény/Athény//). Sublemmas are also used to distinguish some specific groups of forms that are subsumed under one lemma (e.g. negated forms of adjectives and adverbs //černý/nečerný//, //hezky/nehezky//, short forms of adjectives //mladý/mlád//, suppletive forms //dobře/lépe/líp//, //člověk/lidé//). |
| |
===== The link between a lemma and lexeme ===== | ===== The link between a lemma and lexeme ===== |