# Morfio

The Morfio application serves to give estimates of the extent and productivity of morphological models in Czech based on corpus data. It is therefore a tool which can be used in morphological research, especially for the study of derivation.

Morfio is an online application and is accessible without registration to all users at morfio.korpus.cz.

The application includes a detailed manual and description of displayed results. Just like in other applications (e.g. SyD), Morfio provides a permanent link leading to the input query, which is therefore appropriate for sharing and citing.

## Principle

Generally every morphological relation – excluding the semantic component, which cannot be easily grasped computationally – is created by two other factors:

1. a formal congruence/similarity in particular parts of the word, so-called bases (e.g. dřev- is held in common by the words dřevo and dřevěný)
2. formal differences in specific parts, so-called formants (morphs -o and -ěný from the previous example).

The aim of the Morfio application is to find all pairs (also groups of three or four), units in the corpus, which have an identical base and differ only in specified formants (and also possible selected sound/phoneme alterations).

The above example of a morphological model equals pairs of words where the first is a noun ending in -o in the nom. sg., and the second word ending in -ený nebo -ěný in the nom. sg. masc. The base of both words remains unspecified, however it must remain the same in both words. In the results we find pairs of lemmas (or word forms, depending on the initial settings), which fulfil the given condition both semantically and formally:

dřevo - dřevěný, olovo - olověný, síto - sítěný

However, we can also find cases where no morphological motivation can be identified:

milo - milený, živo - živený, dělo - dělený

Unlike the regular onomasiological approach (meaning → form), when working with a corpus that is not semantically tagged we must begin with the form (a semasiological approach). This can pose several problems (e.g. in the case of homonymy) whose solutions are outside the scope of this tool and require manual analysis carried out by linguists.

The output of the Morfio application is not and cannot be free of errors and without any further adjustments, revisions and linguistic manipulations cannot generate directly publishable results. It is rather a tool which can rework a large amount of data for linguistic purposes, making analysis faster and more accessible for researchers. As with other corpus search engines, the aim is only to reach a success rate of 100% when searching for a given type (recall) and to provide an organized sorting of the results, while their relevance and precision is left solely to the user's judgement: i.e. the actual query formulation and the subsequent interpretation of the findings.

### Alternations

To require a precise congruence of form between the individual members of the morphological model would be impractical for a language such as Czech, due to the fact that within the morphological processes there are frequently cases of regular phoneme alternations in the base. The Morfio application reckons with alternations and their range can be set (e.g. the alternation of e – é in letět – létat; i – e in prosit – prošen; r – ř in starý – stařec; sk – šť in český – čeština etc.).

## Application images

Query and output screen