| Both sides previous revisionPrevious revisionNext revision | Previous revision |
| en:pojmy:syntakticka_analyza [2021/01/20 11:48] – [Searching KonText for syntactic structures: syntactic attributes] tomasjelinek | en:pojmy:syntakticka_analyza [2026/01/19 10:45] (current) – [Visualisation of syntactic structures in KonText] tomasjelinek |
|---|
| ====== Syntactic analysis and syntactic tagging ====== | ====== Syntactic analysis and syntactic tagging ====== |
| |
| Some of CNC corpora (the first of which is [[en:cnk:syn2015|SYN2015]]) are syntactically annotated, marking dependency relations between two words in a sentence and the analytical functions of individual words. This syntactic annotation is based on the principles of the analytical-layer annotation used in the [[http://ufal.mff.cuni.cz/pdt2.0/index-cz.html|Prague Dependency Treebank]] (PDT). | Some of CNC corpora (the first of which is [[en:cnk:syn2015|SYN2015]]) are syntactically annotated, marking dependency relations between two words in a sentence and the analytical functions of individual words. This syntactic annotation is based on the principles of the analytical-layer annotation used in the [[http://ufal.mff.cuni.cz/pdt2.0/index-cz.html|Prague Dependency Treebank]] (PDT). The [[en:cnk:intercorp|InterCorp]] parallel corpus in its release [[en:cnk:intercorp:verze13ud|13ud]] is syntactically (and also morphologically) annotated in an alternative way, following the guidelines of the international [[en:pojmy:ud|Universal Dependencies]] project. |
| |
| ===== The system of syntactic annotation: the analytical layer of the Prague Dependency Treebank ===== | ===== The system of syntactic annotation: the analytical layer of the Prague Dependency Treebank ===== |
| ==== Automatic syntactic annotation: parsing ==== | ==== Automatic syntactic annotation: parsing ==== |
| |
| Syntactic annotation is done automatically, using a syntactic ([[en:pojmy:parser|parser]]). For the annotation of the SYN2015 corpus, the TurboParser was used, for SYN2020, a "neural" parser of the NeuroNLP2 tools was used. This kind of annotation has a higher error rate than [[en:pojmy:morfologicka_analyza|morphological annotation]]. In SYN2020, more than 1/9 [[en:pojmy:token|tokens]] are left without a correctly identified „parent“ or correctly matched syntactic function, in SYN2015, it's as much as 1/6 of [[en:pojmy:token|tokens]].\\ | Syntactic annotation is done automatically, using a syntactic ([[en:pojmy:parser|parser]]). For the annotation of the SYN2015 corpus, the TurboParser was used, for SYN2020 and SYN2025, a neural parser of the NeuroNLP2 tools was used. This kind of annotation has a higher error rate than [[en:pojmy:morfologicka_analyza|morphological annotation]]. In SYN2020, more than 1/9 [[en:pojmy:token|tokens]] are left without a correctly identified „parent“ or correctly matched syntactic function, in SYN2015, it's as much as 1/6 of [[en:pojmy:token|tokens]].\\ |
| The success rate of parsing is measured as UAS (unlabeled attachment score), the rate of successful parent identification, and LAS (labeled attachment score), the rate of successful identification of both parent and syntactic function. In the SYN2015 and SYN2020, these rates are as follows: | The success rate of parsing is measured as UAS (unlabeled attachment score), the rate of successful parent identification, and LAS (labeled attachment score), the rate of successful identification of both parent and syntactic function. In the SYN2015 and SYN2020, these rates are as follows: |
| |
| | SYN2015 | 88,48 % | 82,46 % | | | SYN2015 | 88,48 % | 82,46 % | |
| | SYN2020 | 92,39 % | 88,73 % | | | SYN2020 | 92,39 % | 88,73 % | |
| | | SYN2025 | 92,56 % | 88,94 % | |
| |
| |
| ===== Visualisation of syntactic structures in KonText ===== | ===== Visualisation of syntactic structures in KonText ===== |
| |
| For every sentence in a syntactically annotated corpus (for now in [[en:cnk:syn2015|SYN2015]] and in [[en:cnk:syn2020|SYN2020]]), a syntactic structure can be visualised by clicking on a little icon representing a syntactic tree on the left side of a concordance line (marked with a red circle in the following image):\\ | For every sentence in a syntactically annotated corpus (currently [[en:cnk:syn2025|SYN2025]], [[en:cnk:syn2025|SYN2025]]) and [[en:cnk:syn2015|SYN2015]]), a syntactic structure can be visualised by clicking on a little icon representing a syntactic tree on the left side of a concordance line (marked with a red circle in the following image):\\ |
| |
| {{:pojmy:zobrazenisyntaxe.png?500|Syntactic structure visualisation}}\\ | {{:pojmy:zobrazenisyntaxe.png?500|Syntactic structure visualisation}}\\ |
| |
| By clicking on this icon, a representation of the syntactic structure is displayed (a syntactic tree). The left-to-right order in the syntactic representation corresponds to the order in the sentence, the dependent tokens are placed below the governing tokens. The following image represents the structure of a subordinate clause from the corpus SYN2020 //aby ses měla nač vymluvit// "so that you can find an excuse", the sentence contains three so called [[en:cnk:syn2020:agregat|agreggates]], i.e. tokens containing two or more syntactic words):\\ | By clicking on the icon, a representation of the syntactic structure is displayed (a syntactic tree). The left-to-right order in the syntactic representation corresponds to the order in the sentence, the dependent tokens are placed below the governing tokens. The following image represents the structure of a subordinate clause from the SYN2020 corpus "//aby ses měla nač vymluvit//" [so that you can find an excuse]. The sentence contains three so called [[en:cnk:syn2020:agregat|agreggates]], i.e. tokens containing two or more syntactic words:\\ |
| |
| {{:cnk:syn2020:agregaty_syntax.png?250|Example of syntactic structure in Kontext}}\\ | {{:cnk:syn2020:agregaty_syntax.png?250|Example of syntactic structure in Kontext}}\\ |