A morphological tag (commonly called tag) is a summary of the grammatical information about a specific word (position ) in the given context. A tag is usually automatically generated based on a morphological analysis and a subsequent disambiguation.
Tags are positional attributes. A morphological tag in the Czech CNC corpora consists of a sequence of symbols (letters and numbers) which have a specific meaning based on the position which they occupy in the code. In the Czech sentence Po promoci na londýnské universitě odjel jsem roku 1878 do Netley na školení vojenských chirurgů. the word form promoci (although this form is potentially morphologically ambiguous) has a morphological tag NNFS6—–A—–
, which indicates that it is a:
A set of rules and values which can occur in a tag is called a tagset. The positional tagset used in the Czech CNC corpora has 16 positions (starting from SYN2020, we are using a modified tagset), each of which carries some information about a specific grammatical category:
There are different tagsets for various languages. Description of these tagsets can be found here.