AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
en:pojmy:syntakticka_analyza [2020/08/27 17:11] – [Examples of syntactic structure] veronikapojarovaen:pojmy:syntakticka_analyza [2020/12/17 11:59] – [Automatic syntactic annotation: parsing] tomasjelinek
Line 1: Line 1:
 ====== Syntactic analysis and syntactic tagging ====== ====== Syntactic analysis and syntactic tagging ======
  
-Some of CNC'corpora (the first of which is [[en:cnk:syn2015|SYN2015]]) are syntactically annotated, marking dependency relations between two words in a sentence and the analytical functions of individual words. This syntactic annotation is based on the principles of the analytical-layer annotation used in the [[http://ufal.mff.cuni.cz/pdt2.0/index-cz.html|Prague Dependency Treebank]] (PDT).+Some of CNC corpora (the first of which is [[en:cnk:syn2015|SYN2015]]) are syntactically annotated, marking dependency relations between two words in a sentence and the analytical functions of individual words. This syntactic annotation is based on the principles of the analytical-layer annotation used in the [[http://ufal.mff.cuni.cz/pdt2.0/index-cz.html|Prague Dependency Treebank]] (PDT).
  
 ===== The system of syntactic annotation: the analytical layer of the Prague Dependency Treebank ===== ===== The system of syntactic annotation: the analytical layer of the Prague Dependency Treebank =====
Line 9: Line 9:
 ==== Automatic syntactic annotation: parsing ==== ==== Automatic syntactic annotation: parsing ====
  
-Syntactic annotation is done automatically, using a stochastic program ([[en:pojmy:parser|parser]]), in this case the TurboParser program. This kind of annotation has a much lower error rate than [[en:pojmy:morfologicka_analyza|morphological annotation]]. Approximately 1/[[en:pojmy:token|tokens]] are left without a correctly identified „parent“ or correctly matched syntactic function. The success rate of parent identification, i.e. UAS (unlabeled attachment score), is 88,48 %; the success rate of both parent and syntactic function identification, i.e. LAS (labeled attachment score), is 82.46%Therefore, although syntactic annotation can be used as an **approximate guide for further language research**, we must keep in mind that it is not entirely reliable. The error rate is higher for less common syntactic functions and constructions, whereas the most frequent functions in expected contexts have an error rate lower than 10%.+Syntactic annotation is done automatically, using a syntactic ([[en:pojmy:parser|parser]]). For the annotation of the SYN2015 corpus, the TurboParser was used, for SYN2020, a "neural" parser of the NeuroNLP2 tools was used. This kind of annotation has a higher error rate than [[en:pojmy:morfologicka_analyza|morphological annotation]]. In SYN2020, more than 1/[[en:pojmy:token|tokens]] are left without a correctly identified „parent“ or correctly matched syntactic function, in SYN2015, it's as much as 1/6 of [[en:pojmy:token|tokens]].\\ 
 +The success rate of parsing is measured as UAS (unlabeled attachment score), the rate of successful parent identification, and LAS (labeled attachment score), the rate of successful identification of both parent and syntactic functionIn the SYN2015 and SYN2020, these rates are as follows: 
 + 
 +^ korpus ^ UAS ^ LAS^ 
 +| SYN2015 | 88,48 % | 82,46 % 
 +| SYN2020 | 92,39 % | 88,73 % | 
 + 
 + 
 +Therefore, although syntactic annotation can be used as an **approximate guide for further language research**, we must keep in mind that it is not entirely reliable. The error rate is higher for less common syntactic functions and constructions, whereas the most frequent functions in expected contexts have an error rate lower than 5% (SYN2020) or 10% (SYN2015).
  
 [{{ :pojmy:mf041122_color.jpg?400|}}] [{{ :pojmy:mf041122_color.jpg?400|}}]
Line 24: Line 32:
 ===== Searching KonText for syntactic structures: syntactic attributes ===== ===== Searching KonText for syntactic structures: syntactic attributes =====
  
-Searching in syntactically annotated corpora typically requires an interface specially designed to display the syntactic structure, for example the program [[https://ufal.mff.cuni.cz/tred/|TrEd]]. The [[en:manualy:kontext|KonText]] interface does not offer the option of viewing the syntactic structure, nonetheless it is possible to search for words and phrases according to syntactic parameters. For this purpose, each token is assigned several [[en:pojmy:atributy_pozicni|attributes]], in addition to the smaller number of attributes which are assigned only to selected tokens. All syntactic attributes are described in a [[en:seznamy:syntakticke_znacky|separate entry]]. The basic syntactic attributes assigned to all tokens are: +Searching in syntactically annotated corpora typically requires an interface specially designed to display the syntactic structure, for example the program [[https://ufal.mff.cuni.cz/tred/|TrEd]]. The [[en:manualy:kontext:index|KonText]] interface does not offer the option of viewing the syntactic structure, nonetheless it is possible to search for words and phrases according to syntactic parameters. For this purpose, each token is assigned several [[en:pojmy:atributy_pozicni|attributes]], in addition to the smaller number of attributes which are assigned only to selected tokens. All syntactic attributes are described in a [[en:seznamy:syntakticke_znacky|separate entry]]. The basic syntactic attributes assigned to all tokens are: 
   * [[en:seznamy:parent|parent]] (numbered reference to the position of the governing token)    * [[en:seznamy:parent|parent]] (numbered reference to the position of the governing token) 
   * [[en:seznamy:afun|afun]] (syntactic function)   * [[en:seznamy:afun|afun]] (syntactic function)
Line 42: Line 50:
 ''%%[afun="Obj" & tag="NN..4.*" & p_lemma="převážet"]%%'' ''%%[afun="Obj" & tag="NN..4.*" & p_lemma="převážet"]%%''
  
-We can also search fo all words (syntactic nouns) in the 7th case (instrumental) with the preposition //mezi// which are dependent on a verb in the infinitive: ''%%[prep="mezi" & case="7" & ep_tag="Vf.*"]%%''.+We can also search for all words (syntactic nouns) in the 7th case (instrumental) with the preposition //mezi// which are dependent on a verb in the infinitive: ''%%[prep="mezi" & case="7" & ep_tag="Vf.*"]%%''.
  
  --- //Tomáš Jelínek//  --- //Tomáš Jelínek//