AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:pojmy:syntakticka_analyza [2020/08/27 16:44] – [Příklad syntaktické struktury] veronikapojarovaen:pojmy:syntakticka_analyza [2022/08/13 13:08] (current) – [Syntactic analysis and syntactic tagging] alexandrrosen
Line 1: Line 1:
 ====== Syntactic analysis and syntactic tagging ====== ====== Syntactic analysis and syntactic tagging ======
  
-Some of CNC'corpora (the first of which is [[en:cnk:syn2015|SYN2015]]) are syntactically annotated, marking dependency relations between two words in a sentence and the analytical functions of individual words. This syntactic annotation is based on the principles of the analytical-layer annotation used in the [[http://ufal.mff.cuni.cz/pdt2.0/index-cz.html|Prague Dependency Treebank]] (PDT).+Some of CNC corpora (the first of which is [[en:cnk:syn2015|SYN2015]]) are syntactically annotated, marking dependency relations between two words in a sentence and the analytical functions of individual words. This syntactic annotation is based on the principles of the analytical-layer annotation used in the [[http://ufal.mff.cuni.cz/pdt2.0/index-cz.html|Prague Dependency Treebank]] (PDT).  The [[en:cnk:intercorp|InterCorp]] parallel corpus in its release [[en:cnk:intercorp:verze13ud|13ud]] is syntactically (and also morphologically) annotated in an alternative way, following the guidelines of the international [[en:pojmy:ud|Universal Dependencies]] project.
  
 ===== The system of syntactic annotation: the analytical layer of the Prague Dependency Treebank ===== ===== The system of syntactic annotation: the analytical layer of the Prague Dependency Treebank =====
Line 9: Line 9:
 ==== Automatic syntactic annotation: parsing ==== ==== Automatic syntactic annotation: parsing ====
  
-Syntactic annotation is done automatically, using a stochastic program ([[en:pojmy:parser|parser]]), in this case the TurboParser program. This kind of annotation has a much lower error rate than [[en:pojmy:morfologicka_analyza|morphological annotation]]. Approximately 1/[[en:pojmy:token|tokens]] are left without a correctly identified „parent“ or correctly matched syntactic function. The success rate of parent identification, i.e. UAS (unlabeled attachment score), is 88,48 %; the success rate of both parent and syntactic function identification, i.e. LAS (labeled attachment score), is 82.46%Therefore, although syntactic annotation can be used as an **approximate guide for further language research**, we must keep in mind that it is not entirely reliable. The error rate is higher for less common syntactic functions and constructions, whereas the most frequent functions in expected contexts have an error rate lower than 10%.+Syntactic annotation is done automatically, using a syntactic ([[en:pojmy:parser|parser]]). For the annotation of the SYN2015 corpus, the TurboParser was used, for SYN2020, a "neural" parser of the NeuroNLP2 tools was used. This kind of annotation has a higher error rate than [[en:pojmy:morfologicka_analyza|morphological annotation]]. In SYN2020, more than 1/[[en:pojmy:token|tokens]] are left without a correctly identified „parent“ or correctly matched syntactic function, in SYN2015, it's as much as 1/6 of [[en:pojmy:token|tokens]].\\ 
 +The success rate of parsing is measured as UAS (unlabeled attachment score), the rate of successful parent identification, and LAS (labeled attachment score), the rate of successful identification of both parent and syntactic functionIn the SYN2015 and SYN2020, these rates are as follows: 
 + 
 +^ korpus ^ UAS ^ LAS^ 
 +| SYN2015 | 88,48 % | 82,46 % 
 +| SYN2020 | 92,39 % | 88,73 % | 
 + 
 + 
 +Therefore, although syntactic annotation can be used as an **approximate guide for further language research**, we must keep in mind that it is not entirely reliable. The error rate is higher for less common syntactic functions and constructions, whereas the most frequent functions in expected contexts have an error rate lower than 5% (SYN2020) or 10% (SYN2015).
  
 [{{ :pojmy:mf041122_color.jpg?400|}}] [{{ :pojmy:mf041122_color.jpg?400|}}]
Line 15: Line 23:
 ===== Syntactic dependency structure ===== ===== Syntactic dependency structure =====
  
-In dependency-based syntactic annotation, each token is assigned one „parent“, i.e. another token on which the given token is dependent, or alternatively the „root“ of the sentence, an external parent representing the entire sentence (e.g. the predicate in the main clause is dependent on the sentence „root“). One syntactic tag is also assigned to each token. Syntactic tags partially correnspond to the usual syntactic functions such as predicate (Pred), subject (Sb), attribute (Atr) etc., and partially they have auxiliary functions, most often assigned to synsemantic words (e.g. AuxP for prepositions) and punctuation marks (AuxK for punctuation marks at the end of a sentence).+In dependency-based syntactic annotation, each token is assigned one „parent“, i.e. another token on which the given token is dependent, or alternatively the „root“ of the sentence, an external parent representing the entire sentence (e.g. the predicate in the main clause is dependent on the sentence „root“). One syntactic tag is also assigned to each token. Syntactic tags partially correspond to the usual syntactic functions such as predicate (Pred), subject (Sb), attribute (Atr) etc., and partially they have auxiliary functions, most often assigned to synsemantic words (e.g. AuxP for prepositions) and punctuation marks (AuxK for punctuation marks at the end of a sentence).
 ==== Examples of syntactic structure ==== ==== Examples of syntactic structure ====
  
-The syntactic structure of a sentence can be illustrated using the example //Plavidlo bude převážet turisty mezi minaretem a zříceninou Janohrad v parku.// The sentence is shown as a dependency tree, where the branches represent the dependency relations between the indicidual words. In the dependency tree, tokens with basic syntax functions are written in <fc #ff0000>red</fc>, tokens with auxiliary functions are <fc #008000>green</fc> and graphic symbols are <fc #dddd00>yellow</fc>.+The syntactic structure of a sentence can be illustrated using the example //Plavidlo bude převážet turisty mezi minaretem a zříceninou Janohrad v parku.// The sentence is shown as a dependency tree, where the branches represent the dependency relations between the individual words. In the dependency tree, tokens with basic syntax functions are written in <fc #ff0000>red</fc>, tokens with auxiliary functions are <fc #008000>green</fc> and graphic symbols are <fc #dddd00>yellow</fc>. 
 + 
 +The technical root of the dependency tree (top left, with the sentence identifier) governs the predicate //převážet// (Pred) and final punctuation mark (AuxK). The predicate governs the subject //Plavidlo// (Sb) and object //turisty// (Obj). The auxiliary verb //bude// (AuxV) forms one verbal unit with the verb //převážet//, and it is therefore also depicted as being dependent on this node. Furthermore, the verb //převážet// also governs the coordinated prepositional phrase with a locative adverbial function// mezi minaretem a zříceninou Janohrad//. In PDT, on the level of surface syntax the preposition's function is to formally govern, so the verb //převážet// governs the preposition (AuxP), which in turn governs the representative of the coordinating relation, the conjunction //a// (Coord). Both coordinated nouns from the prepositional phrase //minaretem a zříceninou// are dependent on the coordination node (Adv_Co: Adv, i.e. adverbial function, is furnished with an additional component _Co, which indicates coordinated members). The noun //zříceninou// is further modified by the incongruent attribute //Janohrad// (Atr). The coordinating node (Coord) also governs the prepositional phrase //v parku//, which itself is not coordinated, but modifies both members of the coordination, i.e. both the word //minaretem// and the word //zříceninou//. Again, the preposition //v// (AuxP) is dependent on the governing member, and the noun //parku// with the attributive function (Atr) is in turn dependent on the preposition. 
 + 
 +===== Visualisation of syntactic structures in KonText ===== 
 + 
 +For every sentence in a syntactically annotated corpus (currently [[en:cnk:syn2015|SYN2015]] and [[en:cnk:syn2020|SYN2020]]), a syntactic structure can be visualised by clicking on a little icon representing a syntactic tree on the left side of a concordance line (marked with a red circle in the following image):\\ 
 + 
 +{{:pojmy:zobrazenisyntaxe.png?500|Syntactic structure visualisation}}\\ 
 + 
 +By clicking on the icon, a representation of the syntactic structure is displayed (a syntactic tree). The left-to-right order in the syntactic representation corresponds to the order in the sentence, the dependent tokens are placed below the governing tokens. The following image represents the structure of a subordinate clause from the SYN2020 corpus "//aby ses měla nač vymluvit//" [so that you can find an excuse]. The sentence contains three so called [[en:cnk:syn2020:agregat|agreggates]], i.e. tokens containing two or more syntactic words:\\
  
-The technical root of the dependency tree (top left, with the sentence identifier) governs the predicate //převážet// (Pred) and final punctuation mark (AuxK)The predicate governs the subject //Plavidlo// (Sb) and object //turisty// (Obj). The auxiliary verb //bude// (AuxV) forms one verbal unit with the verb //převážet//, and it is therefore also depicted as being dependent on this node. Furthermore, the verb //převážet// also governs the coordinated prepositional phrase with a locative adverbial function// mezi minaretem a zříceninou Janohrad//. In PDT, on the level of surface syntax the preposition's function is to formally govern, so the verb //převážet// governs the preposition (AuxP), which in turn governs the representative of the coordinating relation, the conjunction //a// (Coord). Both coordinated nouns from the prepositional phrase //minaretem a zříceninou// are dependent on the coordination node (Adv_Co: Adv, i.e. adverbial function, is furnished with an additional component _Co, which indicates coordinated members). Podstatné jméno //zříceninou// je dále rozvito neshodným přívlastkem //Janohrad// (Atr). Na koordinačním uzlu (Coord) je také závislá předložková fráze //v parku//, která sice není koordinovaná, ale rozvíjí oba členy koordinace, tj. jak slovo //minaretem//, tak slovo //zříceninou//. Opět je zde předložka //v// (AuxP) závislá na řídícím členu, na předložce je pak závislé substantivum //parku// s funkcí přívlastku (Atr).+{{:cnk:syn2020:agregaty_syntax.png?250|Example of syntactic structure in Kontext}}\\
  
-===== Vyhledávání syntaktických struktur v KonTextusyntaktické atributy =====+===== Searching KonText for syntactic structuressyntactic attributes =====
  
-Pro prohlížení syntakticky anotovaných korpusů se obvykle používají speciální prohlížeče schopné zobrazit syntaktickou strukturunapříklad program [[https://ufal.mff.cuni.cz/tred/|TrEd]]. V prohlížeči [[manualy:kontext|KonTextu]] možnost zobrazovat syntaktickou strukturu není, lze ale vyhledávat slova a slovní spojení podle syntaktických parametrů. K tomu je každému tokenu přiřazeno několik [[pojmy:atributy_pozicni|atributů]], některé další atributy jsou pak přiřazeny jen vybraným tokenůmVšechny syntaktické atributy jsou popsané v [[seznamy:syntakticke_znacky|samostatném článku]]. Základní syntaktické atributy přiřazené všem tokenům jsou:  +It is possible to formulate queries in KonText based on syntactic properties of words. For this purposeeach token is assigned several [[en:pojmy:atributy_pozicni|attributes]]. All syntactic attributes are described in a [[en:seznamy:syntakticke_znacky|separate entry]]. The basic syntactic attributes assigned to all tokens are:  
-  * [[seznamy:parent|parent]] (číselný odkaz na pozici řídícího tokenu)  +  * [[en:seznamy:parent|parent]] (numbered reference to the position of the governing token)  
-  * [[seznamy:afun|afun]] (syntaktická funkce)+  * [[en:seznamy:afun|afun]] (syntactic function)
    
-Další atributy umožňují vyhledávat podle vlastností rodiče“. U autosémantických slov lze vyhledávat i podle efektivního rodiče“, což je nejbližší autosémantický rodič (či prarodičdaného slovaVe výše uvedeném příkladu by tak slovu //zříceninou//, které je závislé es koordinaci a předložku //mezi// na slovese //převážet//, byly přiřazeny následující atributy:+Other attributes allow us to search based on the parent“ propertiesIn the case of autosemantic words we can search based on the effective parent“, which is the given word's closest autosemantic parent (or grandparent). In the previously mentioned example, this would mean that the word //zříceninou//, which is dependent on the verb //evážet// by way of coordination and the preposition //mezi//, would be assigned the following attributes:
  
 ''%%afun="Adv_Co";%%'' ''%%afun="Adv_Co";%%''
Line 39: Line 57:
  
  
-V korpusu pak lze podle těchto atributů vyhledávatnapřlze vyhledat všechna substantiva v akuzativu se syntaktickou funkcí Obj závislá na slovese //převážet//:+The corpus allows us to search based on these attributese.ga search for all nouns in the accusative case with the syntactic function Obj and dependent on the verb //převážet// would look like this:
 ''%%[afun="Obj" & tag="NN..4.*" & p_lemma="převážet"]%%'' ''%%[afun="Obj" & tag="NN..4.*" & p_lemma="převážet"]%%''
  
-Nebo lze vyhledat všechna slova (syntaktická substantivav sedmém pádě s předložkou mezi závislá na slovese v infinitivu: ''%%[prep="mezi" & case="7" & ep_tag="Vf.*"]%%''.+We can also search for all words (syntactic nounsin the 7th case (instrumental) with the preposition //mezi// which are dependent on a verb in the infinitive: ''%%[prep="mezi" & case="7" & ep_tag="Vf.*"]%%''.
  
  --- //Tomáš Jelínek//  --- //Tomáš Jelínek//