<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="FeedCreator 1.8" -->
<?xml-stylesheet href="http://wiki.korpus.cz/lib/exe/css.php?s=feed" type="text/css"?>
<rdf:RDF
    xmlns="http://purl.org/rss/1.0/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel rdf:about="http://wiki.korpus.cz/feed.php">
        <title>Příručka ČNK - en:pojmy</title>
        <description>Báze znalostí z korpusové lingvistiky</description>
        <link>http://wiki.korpus.cz/</link>
        <image rdf:resource="http://wiki.korpus.cz/lib/exe/fetch.php/wiki:dokuwiki.svg" />
       <dc:date>2026-04-30T02:59:37+00:00</dc:date>
        <items>
            <rdf:Seq>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:anotace_mwe?rev=1769381291&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:arf?rev=1481556142&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:din?rev=1571165949&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:dotazovaci_jazyk?rev=1608573937&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:frekvence?rev=1597070421&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:ipm?rev=1465803378&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:konkordance?rev=1465806268&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:kwic?rev=1707383712&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:lemma?rev=1650456462&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:lexikalni_bohatost?rev=1729278466&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:prehled_pojmu?rev=1719002469&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:regularni_vyrazy?rev=1462268485&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:syntakticka_analyza?rev=1768815947&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:syntakticka_komplexita?rev=1729276792&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:tag?rev=1768561476&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:ud?rev=1728417031&amp;do=diff"/>
                <rdf:li rdf:resource="http://wiki.korpus.cz/doku.php/en:pojmy:word?rev=1481202263&amp;do=diff"/>
            </rdf:Seq>
        </items>
    </channel>
    <image rdf:about="http://wiki.korpus.cz/lib/exe/fetch.php/wiki:dokuwiki.svg">
        <title>Příručka ČNK</title>
        <link>http://wiki.korpus.cz/</link>
        <url>http://wiki.korpus.cz/lib/exe/fetch.php/wiki:dokuwiki.svg</url>
    </image>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:anotace_mwe?rev=1769381291&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2026-01-25T22:48:11+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>anotace_mwe</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:anotace_mwe?rev=1769381291&amp;do=diff</link>
        <description>Annotation of Multiword Expressions

Specialized tools are being developed for the automatic identification of multiword expressions (phrasemes and collocations) in corpora. 

MWE lemmatization and tagging

Starting with the SYNv14 corpus, multiword expressions are annotated in corpora using new lemmas and tags linked to the</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:arf?rev=1481556142&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2016-12-12T15:22:22+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>arf</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:arf?rev=1481556142&amp;do=diff</link>
        <description>ARF (average reduced frequency)

ARF is one of the many adjusted frequencies of a word form in a corpus. Adjusted frequencies adjust the simple frequency (number of occurrences) of a given word or phenomenon in the corpus to the degree of the uniformity of how its occurrences are distributed, taking into account dispersion. $$ARF = \frac{1}{v} \sum_{i=1}^{f} \min (d_{i}, v)$$$f$$N$$d_{i}$$v$$v = \frac{N}{f}$$d_{i} = v$$i$</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:din?rev=1571165949&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2019-10-15T18:59:09+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>din</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:din?rev=1571165949&amp;do=diff</link>
        <description>DIN

The DIN (Difference index) is a so called effect-size metric, i.e. a measure designed for the purpose of quantifying the relevance differences between values. The DIN is implemented for extracting prominent units from a text (keywords) in the KWords tool.$$DIN = 100 \times \frac{RelFq(Ttxt) - RelFq(RefC)}{RelFq(Ttxt) + RelFq(RefC)}$$</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:dotazovaci_jazyk?rev=1608573937&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2020-12-21T18:05:37+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>dotazovaci_jazyk</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:dotazovaci_jazyk?rev=1608573937&amp;do=diff</link>
        <description>Query language

Query languages are used to query database systems in information technologies; every system uses a query language with precisely defined syntax. 

For work with language corpora, the query language is used for inputting queries into</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:frekvence?rev=1597070421&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2020-08-10T14:40:21+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>frekvence</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:frekvence?rev=1597070421&amp;do=diff</link>
        <description>Frequency

In corpus linguistics, frequency is the number of times a given form or phenomenon occurs in the corpus. It is either given as an absolute value, e.g. the lemma pes occurs in the 100 million word corpus SYN2010 17 701 times, or as a relative value, e.g. the lemma $REL = \frac{ABS}{N} \times 1000000$$rr = \frac{r}{n}$$ E = p(A) \times N $$ p(\text{škola}) = \frac{f(\text{škola})}{N} = \frac{47872}{122419382} = 0,0003910492 = 3,91 \cdot 10^{-4} $$ E(\text{škola}) = p(\text{škola}) \time…</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:ipm?rev=1465803378&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2016-06-13T07:36:18+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>ipm</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:ipm?rev=1465803378&amp;do=diff</link>
        <description>ipm

The abbreviations ipm (instances per million) and ppm (parts per million) are measures of relative frequency. They express the average number of occurences of the unit or word in a hypothetical text/corpus with the size of 1 million words.

Eg. The node form běžeckých</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:konkordance?rev=1465806268&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2016-06-13T08:24:28+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>konkordance</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:konkordance?rev=1465806268&amp;do=diff</link>
        <description>Concordance

A concordance represents all events (occurrences) of the searched phenomenon in the corpus along with the surrounding context. In practice, within the concordance we single out the KWIC (i.e. key word in context), which is the searched word/phenomenon and its right and left context. One line of the concordance list is called a concordance line.</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:kwic?rev=1707383712&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2024-02-08T09:15:12+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>kwic</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:kwic?rev=1707383712&amp;do=diff</link>
        <description>KWIC

KWIC is the English abbreviation of key word in context, which is used to label the search term (or a sequence of terms) in contexts of various sizes. The Czech equivalent keyword is homonymous with the term denoting items which are prominent thanks to their frequency in the text, serving as a basis for text analysis. (</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:lemma?rev=1650456462&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2022-04-20T12:07:42+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>lemma</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:lemma?rev=1650456462&amp;do=diff</link>
        <description>Lemma

A lemma is a representative dictionary form of a word, and in the proces of lemmatization during automatic language processing it is the form which is assigned to every form of the given word in the corpus. 

Approaches to lemmatization can differ in specific details, but it is generally the case that:</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:lexikalni_bohatost?rev=1729278466&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2024-10-18T19:07:46+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>lexikalni_bohatost</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:lexikalni_bohatost?rev=1729278466&amp;do=diff</link>
        <description>Lexical Diversity

	*  InterCorp release 16ud is annotated by following two measures of lexical diversity. They are specified as metadata for each text of sufficient length, for each linguistically annotated language:
		*  lexDivWord: average number of different word forms per 1000 tokens</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:prehled_pojmu?rev=1719002469&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2024-06-21T20:41:09+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>prehled_pojmu</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:prehled_pojmu?rev=1719002469&amp;do=diff</link>
        <description>Corpus linguistics – key terminology

See, e.g., Corpus Linguistics Glossary (Kent State University, Ohio) or A Glossary of Corpus Linguistics (by Paul Baker, Andrew Hardie and Tony McEnery, Edinburgh University Press, 2006).

C

Complexity

D

Diversity

L

Lexical Diversity

S

Syntactic Complexity

U

UD

Universal Dependencies</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:regularni_vyrazy?rev=1462268485&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2016-05-03T09:41:25+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>regularni_vyrazy</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:regularni_vyrazy?rev=1462268485&amp;do=diff</link>
        <description>Regular expressions

Regular expressions (the term comes from a theory of formal languages, but its meaning as it is used in IT is slightly different) allow us to accurately describe the set of text strings matching the search term or phenomenon. For these purposes,</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:syntakticka_analyza?rev=1768815947&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2026-01-19T09:45:47+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>syntakticka_analyza</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:syntakticka_analyza?rev=1768815947&amp;do=diff</link>
        <description>Syntactic analysis and syntactic tagging

Some of CNC corpora (the first of which is SYN2015) are syntactically annotated, marking dependency relations between two words in a sentence and the analytical functions of individual words. This syntactic annotation is based on the principles of the analytical-layer annotation used in the</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:syntakticka_komplexita?rev=1729276792&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2024-10-18T18:39:52+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>syntakticka_komplexita</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:syntakticka_komplexita?rev=1729276792&amp;do=diff</link>
        <description>Syntactic Complexity

InterCorp release 16ud is annotated by several measures of syntactic complexity. They are specified as metadata for each sentence and each text, for each linguistically annotated language. In KonText, they can be displayed and queried like any other metadata items, such as text author or sentence ID.</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:tag?rev=1768561476&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2026-01-16T11:04:36+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>tag</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:tag?rev=1768561476&amp;do=diff</link>
        <description>Morphological tags

A morphological tag (commonly called tag) is a summary of the grammatical information about a specific word (position ) in the given context. A tag is usually automatically generated based on a morphological analysis and a subsequent  disambiguation.

Tags are positional attributes. A morphological tag in the Czech CNC corpora consists of a sequence of symbols (letters and numbers) which have a specific meaning based on the position which they occupy in the code. In the Czech…</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:ud?rev=1728417031&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2024-10-08T19:50:31+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>ud</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:ud?rev=1728417031&amp;do=diff</link>
        <description>Universal Dependencies – UD

Universal Dependencies is a an open international project aiming at linguistic annotation consistent across different languages. Some recent versions of the InterCorp parallel corpus (13ud and 16ud) have been annotated in terms of morphological categories, syntactic functions and syntactic structure following the UD guidelines and using the tools developed within the UD project.</description>
    </item>
    <item rdf:about="http://wiki.korpus.cz/doku.php/en:pojmy:word?rev=1481202263&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2016-12-08T13:04:23+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>word</title>
        <link>http://wiki.korpus.cz/doku.php/en:pojmy:word?rev=1481202263&amp;do=diff</link>
        <description>Word form (word)

A word form (known as a word in corpus terminology) is a unit which remains morphologically (and possibly also orthographically) specific. With its generality it stands between a token and a lemma.

While a token is one specific realization of a given unit, a word form is a standardized unit; a</description>
    </item>
</rdf:RDF>
