AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Last revisionBoth sides next revision
en:cnk:syn2020 [2021/01/21 09:14] – [Multiple lemmatization and tagging (aggregate)] tomasjelineken:cnk:syn2020 [2021/01/21 16:49] – [Multiple lemmatization and tagging (aggregate)] michalkren
Line 108: Line 108:
 ==== Multiple lemmatization and tagging (aggregate) ==== ==== Multiple lemmatization and tagging (aggregate) ====
  
-In the SYN2020 corpus, **multiple lemmas and tags** for a special group of words, so-called **aggregates** ("multiword tokens" in the terminology of [[https://universaldependencies.org/|Universal Dependencies]]), are newly introduced. Aggregates are words that are written as one orthographic word in Czech, but from the point of view of syntax or specification of grammatical categories they behave as two orthographic words (exceptionally three). The aggregates concern conditional conjunctions (//aby//, //kdyby//), the connection of words with the the enclitical form //s// (//dělalas//, //viděls//, //komus//, //vždyťs//), the connection of prepositions with some pronouns (//nač//, //očpak//, //zaň//), or a combination of words of the last two types (//načs//). For each of these words, two (or three) lemmas, sublemmas, tags and verbtags are specified at the same time according to their respective parts. For detailed information on aggregates, see the aggregate page.+In the SYN2020 corpus, **multiple lemmas and tags** for a special group of words, so-called **aggregates** ("multiword tokens" in the [[https://universaldependencies.org/|Universal Dependencies]] terminology), are newly introduced. Aggregates are words that are written as one orthographic word in Czech, but from the point of view of syntax or specification of grammatical categories they behave as two orthographic words (exceptionally three). The aggregates concern conditional conjunctions (//aby//, //kdyby//), the connection of words with the the enclitical form //s// (//dělalas//, //viděls//, //komus//, //vždyťs//), the connection of prepositions with some pronouns (//nač//, //očpak//, //zaň//), or a combination of words of the last two types (//načs//). For each of these words, two (or three) lemmas, sublemmas, tags and verbtags are specified at the same time according to their respective parts. For detailed information on aggregates, see the aggregate page.
  
 ==== Automatic corpus annotation ==== ==== Automatic corpus annotation ====