AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
en:cnk:fictree [2017/12/15 15:17] – [The syntactic annotation of the treebank] tomasjelineken:cnk:fictree [2017/12/18 19:24] – [How to cite FicTree] michalkren
Line 1: Line 1:
 ===== FicTree: a manually annotated treebank of Czech fiction ===== ===== FicTree: a manually annotated treebank of Czech fiction =====
  
-The FicTree treebank is a syntactically annotated corpus of Czech fiction. It consists of 135,000 words (166,000 tokens).  The lemmatization, the morphologic and syntactic annotation were performed manually.+The FicTree treebank is a syntactically annotated corpus of Czech fiction. It consists of 135,000 words (166,000 tokens).  The lemmatization, the morphological and syntactic annotation were performed manually.
 <WRAP right 35%> <WRAP right 35%>
 ^ <fs medium>Name</fs> ^^ <fs medium>FicTree</fs> ^ ^ <fs medium>Name</fs> ^^ <fs medium>FicTree</fs> ^
Line 14: Line 14:
  
 The FicTree treebank consists of eight literary works published in the Czech Republic between 1991 and 2007. The texts in the treebank include six fiction titles, a children’s fiction  book, and a book of memoirs.  The FicTree treebank consists of eight literary works published in the Czech Republic between 1991 and 2007. The texts in the treebank include six fiction titles, a children’s fiction  book, and a book of memoirs. 
-Most of the texts were first published between 1991 and 2007 except one text, published in 1969.+Most of the texts were first published between 1991 and 2007 except for one text, published in 1969.
 Five texts (80% of all tokens) are original Czech texts, the other three are translations (from German and Slovak). Five texts (80% of all tokens) are original Czech texts, the other three are translations (from German and Slovak).
  
Line 25: Line 25:
 The FicTree treebank can be accessed in several ways: The FicTree treebank can be accessed in several ways:
   - [[en:cnk:fictree#a_cnc_corpus_in_the_kontext_interface|A CNC corpus in the KonText interface]]: FicTree is available as a [[en:cnk:uvod|CNC corpus]] in the [[en:manualy:kontext:index|KonText]] interface.   - [[en:cnk:fictree#a_cnc_corpus_in_the_kontext_interface|A CNC corpus in the KonText interface]]: FicTree is available as a [[en:cnk:uvod|CNC corpus]] in the [[en:manualy:kontext:index|KonText]] interface.
-  - [[en:cnk:fictree#data_annotated_according_to_pdt_a-layer|Data annotated according to PDT a-layer]]: the data of the FicTree treebank annotated according to the [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/cz/a-layer/html/index.html|PDT a-layer guidelines]] are available to download from the [[https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2517|LINDAT/CLARIN]] repository for non-commercial use. +  - [[en:cnk:fictree#data_annotated_according_to_pdt_a-layer|Data annotated according to PDT a-layer]]: the data of the FicTree treebank annotated according to the [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/cz/a-layer/html/index.html|PDT a-layer guidelines]] are available for download from the [[https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2517|LINDAT/CLARIN]] repository for non-commercial use. 
-  - [[en:cnk:fictree#data_annotated_in_the_Universal_Dependencies_standard|Data annotated in the Universal Dependencies standard]]: the data of the FicTree treebank annotated according to the [[http://universaldependencies.org/|Universal Dependencies]] standard into which it was automatically converted  (for non-commercial use only).+  - [[en:cnk:fictree#data_annotated_in_the_Universal_Dependencies_standard|Data annotated in the Universal Dependencies standard]]: the data of the FicTree treebank annotated according to the [[http://universaldependencies.org/|Universal Dependencies]] standard into which it was automatically converted are available through the [[http://universaldependencies.org/treebanks/cs_fictree/index.html|UD web page]] (for non-commercial use only).
  
 ===== 1. A CNC corpus in the KonText interface ===== ===== 1. A CNC corpus in the KonText interface =====
Line 32: Line 32:
 The FicTree corpus is available in the same way as other CNC corpora through the [[en:manualy:kontext:index|KonText]] interface. The FicTree corpus is available in the same way as other CNC corpora through the [[en:manualy:kontext:index|KonText]] interface.
  
-The corpus annotation is accessible through a wide range of attributes of each token. The morphologic and annotation and lemmatization is available using the attributes [[seznamy:tagy|tag]] and [[en:pojmy:lemma|lemma]]additionally, the information about the POS and nominal case (if applicable) of all tokens is accessible using the attributes **pos** and **case**.+The corpus annotation is accessible through a wide range of attributes for each token. The morphological annotation and lemmatization are available using the attributes [[seznamy:tagy|tag]] and [[en:pojmy:lemma|lemma]]additionally, the information about the POS and nominal case (if applicable) of all tokens is accessible using the attributes **pos** and **case**.
  
-The syntactic annotation of FicTree can be accessed using several positional attributes (the same as in the corpus SYN2015):+The syntactic annotation of FicTree can be accessed using several positional attributes (the same as in the SYN2015 corpus):
   * afun – syntactic function according to the a-layer PDT annotation   * afun – syntactic function according to the a-layer PDT annotation
   * parent – relative position of the governing token   * parent – relative position of the governing token
   * eparent – relative position of the nearest governing content word   * eparent – relative position of the nearest governing content word
   * prep – lemma of a preposition governing the token (if any)   * prep – lemma of a preposition governing the token (if any)
-  * p_lemma, p_tag, ep_lemma, ep_tag – tag lemma of the governing token+  * p_lemma, p_tag, ep_lemma, ep_tag – tag and lemma of the governing token
   * p_pos, p_case, ep_pos, ep_case – POS and case of the governing token   * p_pos, p_case, ep_pos, ep_case – POS and case of the governing token
   * p_afun, ep_afun – syntactic function of the governing token   * p_afun, ep_afun – syntactic function of the governing token
Line 50: Line 50:
 ===== 3. Data annotated in the Universal Dependencies standard ===== ===== 3. Data annotated in the Universal Dependencies standard =====
  
-The morphological and syntactic annotation according to the UD guidelines was assigned by converting the original PDT annotation. The conversion procedure was designed by Dan Zeman and implemented in [[https://github.com/ufal/treex|Treex]].+The morphological and syntactic annotation according to the UD guidelines was performed by converting the original PDT annotation. The conversion procedure was designed by Dan Zeman and implemented in [[https://github.com/ufal/treex|Treex]].
 The data are available on the [[http://universaldependencies.org/treebanks/cs_fictree/index.html|Universal Dependencies]] webpage. They are in the [[http://universaldependencies.org/format.html|CONLL-U format]]. The original texts are divided into segments of maximum 100 tokens, the segments are shuffled and divided into a train, val and test data set. The FicTree treebank in UD standard is also accessible using the query tool  [[https://lindat.mff.cuni.cz/services/pmltq/|PML-TQ]]. The data are available on the [[http://universaldependencies.org/treebanks/cs_fictree/index.html|Universal Dependencies]] webpage. They are in the [[http://universaldependencies.org/format.html|CONLL-U format]]. The original texts are divided into segments of maximum 100 tokens, the segments are shuffled and divided into a train, val and test data set. The FicTree treebank in UD standard is also accessible using the query tool  [[https://lindat.mff.cuni.cz/services/pmltq/|PML-TQ]].
  
 ===== Acknowledgments ===== ===== Acknowledgments =====
-We wish to thank the participants in the annotation effort: Ivana Klímová, Alena Kropíková and Olga Zitová; as well as Dan Zeman for the data conversion.+We wish to thank the human annotators: Ivana Klímová, Alena Kropíková and Olga Zitová; as well as Dan Zeman for the data conversion.
  
 ===== How to cite FicTree ===== ===== How to cite FicTree =====
Line 60: Line 60:
 Jelínek, T. – Hnátková, M. – Skoumalová, H.: FicTree: manuálně syntakticky anotovaný korpus české beletrie. Ústav Českého národního korpusu FF UK, Praha 2017. Dostupný z WWW: http://www.korpus.cz Jelínek, T. – Hnátková, M. – Skoumalová, H.: FicTree: manuálně syntakticky anotovaný korpus české beletrie. Ústav Českého národního korpusu FF UK, Praha 2017. Dostupný z WWW: http://www.korpus.cz
  
-Jelínek, T.: //FicTree: a Manually Annotated Treebank of Czech Fiction.// In: J. Hlaváčová (Ed.): ITAT 2017 Proceedings, pp. 181–185. http://ceur-ws.org/Vol-1885/181.pdf+Jelínek, T.: FicTree: a Manually Annotated Treebank of Czech Fiction. In: J. Hlaváčová (Ed.): //ITAT 2017 Proceedings//, pp. 181–185. http://ceur-ws.org/Vol-1885/181.pdf
 </WRAP> </WRAP>