A morphological tag (commonly called tag) is a summary of the grammatical information about a specific word (position ) in the given context. A tag is usually automatically generated based on a morphological analysis and a subsequent disambiguation.
Tags are positional attributes. A morphological tag in the Czech CNC corpora consists of a sequence of symbols (letters and numbers) which have a specific meaning based on the position which they occupy in the code. In the Czech sentence Po promoci na londýnské universitě odjel jsem roku 1878 do Netley na školení vojenských chirurgů. the word form promoci (although this form is potentially morphologically ambiguous) has a morphological tag NNFS6-----A----, which indicates that it is a:
A set of rules and values which can occur in a tag is called a tagset. The positional tagset used in the Czech CNC corpora has 15 positions (starting from SYN2020), each of which carries some information about a specific grammatical category:
Previously, a modified tagset with 16 positions was used (with Position 13 not used and Position 16 marking Aspect).
There are different tagsets for various languages. Description of these tagsets can be found here. Some recent versions of the InterCorp parallel corpus have been annotated in terms of morphological categories, syntactic functions and syntactic structure following the UD guidelines.