| Both sides previous revisionPrevious revisionNext revision | Previous revision |
| en:pojmy:tag [2018/05/27 22:52] – michalskrabal | en:pojmy:tag [2026/01/16 12:04] (current) – [Morphological tags] krivan |
|---|
| A morphological tag (commonly called **tag**) is a summary of the grammatical information about a specific word ([[en:pojmy:pozice|position ]]) in the given context. A tag is usually automatically generated based on a [[en:pojmy:morfologicka_analyza|morphological analysis]] and a subsequent [[en:pojmy:desambiguace|disambiguation]]. | A morphological tag (commonly called **tag**) is a summary of the grammatical information about a specific word ([[en:pojmy:pozice|position ]]) in the given context. A tag is usually automatically generated based on a [[en:pojmy:morfologicka_analyza|morphological analysis]] and a subsequent [[en:pojmy:desambiguace|disambiguation]]. |
| |
| Tags are [[en:pojmy:atributy_pozicni|positional attributes]]. A morphological tag in the Czech CNC corpora consists of a sequence of symbols (letters and numbers) which have a specific meaning based on the position which they occupy in the code. In the Czech sentence //Po promoci na londýnské universitě odjel jsem roku 1878 do Netley na školení vojenských chirurgů.// the word form //promoci// (although this form is potentially morphologically ambiguous) has a morphological tag ''NNFS6-----A-----'', which indicates that it is a: | Tags are [[en:pojmy:atributy_pozicni|positional attributes]]. A morphological tag in the Czech CNC corpora consists of a sequence of symbols (letters and numbers) which have a specific meaning based on the position which they occupy in the code. In the Czech sentence //Po promoci na londýnské universitě odjel jsem roku 1878 do Netley na školení vojenských chirurgů.// the word form //promoci// (although this form is potentially morphologically ambiguous) has a morphological tag ''<nowiki>NNFS6-----A----</nowiki>'', which indicates that it is a: |
| * noun (=N) | * noun (=N) |
| * common noun (=N) | * common noun (=N) |
| ===== Tagset ===== | ===== Tagset ===== |
| |
| A set of rules and values which can occur in a tag is called a tagset. The positional [[en:seznamy:tagy#popis_jednotlivych_pozic_znacky|tagset used in the Czech CNC corpora]] has 16 positions, each of which carries some information about a specific grammatical category: | A set of rules and values which can occur in a tag is called a tagset. The positional [[en:seznamy:tagy#popis_jednotlivych_pozic_znacky|tagset used in the Czech CNC corpora]] has 15 positions (starting from SYN2020), each of which carries some information about a specific grammatical category: |
| |
| - Word class | - Word class |
| - Number | - Number |
| - Case | - Case |
| - Possessive case | - Possessive gender |
| - Possessive number | - Possessive number |
| - Person | - Person |
| - Negation | - Negation |
| - Active/passive | - Active/passive |
| | - Aspect |
| - //not used// | - //not used// |
| - //not used// | - Variant (stylistic marking etc...) |
| - Variant, stylistic marking etc.. | |
| - Aspect | |
| |
| | Previously, a modified tagset with 16 positions was used (with Position 13 not used and Position 16 marking Aspect). |
| |
| ===== Tagsets used in the parallel corpus InterCorp ===== | ===== Tagsets used in the parallel corpus InterCorp ===== |
| There are different tagsets for various languages. Description of these tagsets can be found [[en:cnk:intercorp:verze10#morphosyntactic_annotation|here]]. | There are different tagsets for various languages. Description of these tagsets can be found [[en:cnk:intercorp:verze10#morphosyntactic_annotation|here]]. Some recent versions of the InterCorp parallel corpus have been annotated in terms of morphological categories, syntactic functions and syntactic structure following the [[en:pojmy:ud|UD guidelines]]. |
| |
| |