Obsah

Morphological tags

A morphological tag (commonly called tag) is a summary of the grammatical information about a specific word (position ) in the given context. A tag is usually automatically generated based on a morphological analysis and a subsequent disambiguation.

Tags are positional attributes. A morphological tag in the Czech CNC corpora consists of a sequence of symbols (letters and numbers) which have a specific meaning based on the position which they occupy in the code. In the Czech sentence Po promoci na londýnské universitě odjel jsem roku 1878 do Netley na školení vojenských chirurgů. the word form promoci (although this form is potentially morphologically ambiguous) has a morphological tag NNFS6—–A—–, which indicates that it is a:

Tagset

A set of rules and values which can occur in a tag is called a tagset. The positional tagset used in the Czech CNC corpora has 16 positions (starting from SYN2020, we are using a modified tagset), each of which carries some information about a specific grammatical category:

  1. Word class
  2. A more detailed specification of word class
  3. Grammatical gender
  4. Number
  5. Case
  6. Possessive case
  7. Possessive number
  8. Person
  9. Tense
  10. Degree
  11. Negation
  12. Active/passive
  13. not used
  14. not used
  15. Variant, stylistic marking etc..
  16. Aspect

Tagsets used in the parallel corpus InterCorp

There are different tagsets for various languages. Description of these tagsets can be found here.