Table of Contents
Morphological tags
A morphological tag (commonly called tag) is a summary of the grammatical information about a specific word (position ) in the given context. A tag is usually automatically generated based on a morphological analysis and a subsequent disambiguation.
Tags are positional attributes. A morphological tag in the Czech CNC corpora consists of a sequence of symbols (letters and numbers) which have a specific meaning based on the position which they occupy in the code. In the Czech sentence Po promoci na londýnské universitě odjel jsem roku 1878 do Netley na školení vojenských chirurgů. the word form promoci (although this form is potentially morphologically ambiguous) has a morphological tag NNFS6—–A—–
, which indicates that it is a:
- noun (=N)
- common noun (=N)
- femininum, i.e. feminine gender (=F)
- singular number (=S)
- in the sixth case, i.e. locative (=6)
Tagset
A set of rules and values which can occur in a tag is called a tagset. The positional tagset used in the Czech CNC corpora has 16 positions (starting from SYN2020, we are using a modified tagset), each of which carries some information about a specific grammatical category:
- Word class
- A more detailed specification of word class
- Grammatical gender
- Number
- Case
- Possessive case
- Possessive number
- Person
- Tense
- Degree
- Negation
- Active/passive
- not used
- not used
- Variant, stylistic marking etc..
- Aspect
Tagsets used in the parallel corpus InterCorp
There are different tagsets for various languages. Description of these tagsets can be found here.