~~NOTOC~~
====== Annotation of Multiword Expressions ======
Specialized tools are being developed for the automatic identification of multiword expressions (phrasemes and collocations) in corpora.
====== MWE lemmatization and tagging =====
Starting with the [[en:cnk:syn:verze14|SYNv14]] corpus, multiword expressions are annotated in corpora using new lemmas and tags linked to the [[https://db.korpus.cz/search/_lemur_simple|MWE database LEMUR]] (**L**exicon of **Mu**ltiword Exp**r**essions). Tagging is currently in pilot version and builds on the older [[seznamy:frazemy|phraseme annotation method using the FRANTA tool]] (in Czech).
Automatic tagging of MWEs has some **shortcomings**. First of all, it does not claim to be exhaustive, so many expressions are not included in the database. Furthermore, it is necessary to take into account that some expressions may not be found at all (for example, because their non-standard realization has not been detected), or, conversely, their use in a literal sense may be marked as a phraseme (e.g. //Kocour si líže rány, které mu způsobil sousedův pes.//).
Two attributes are used for the annotation: **mwe_lemma** and **mwe_tag**:
* **mwe_lemma** (multiword expression lemma): [[en:pojmy:lemma|lemma]] of a MWE in the form of a dictionary entry in its basic form (nominative singular, infinitive, etc.); individual word forms are separated by an underscore, so the specific value of the **mwe_lemma** attribute is, for example, ''bít_se_jako_lev''. The entry may include multiple lexical variants of the same MWE, e.g. **mwe_lemma** ''bít_se_jako_lev'' includes variants //bít se jako lev//, //rvát se jako lev// and //bránit se jako lev//.
* **mwe_tag** (multiword expression tag): positional [[en:pojmy:tag|tag]] of a MWE consisting of 10 positions. For details see [[seznamy:mwe#atribut_mwe_tag|the list of mwe_tag values]] (in Czech).
====== Older method of MWE lemmatization and tagging ======
FRANTA tool (FRazémová ANotace a Textová Analýza ‘Phraseme annotation and text analysis’) was used for MWE annotation in the [[cnk:syn|SYN]] corpora (versions 4-13). More detailed information is available on the [[seznamy:frazemy|specialized page]] (in Czech).