AplikaceAplikace
Nastavení

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:cnk:fictree [2017/12/18 09:40] – [1. A CNC corpus in the KonText interface] luciechlumskaen:cnk:fictree [2017/12/18 19:24] (current) – [How to cite FicTree] michalkren
Line 1: Line 1:
 ===== FicTree: a manually annotated treebank of Czech fiction ===== ===== FicTree: a manually annotated treebank of Czech fiction =====
  
-The FicTree treebank is a syntactically annotated corpus of Czech fiction. It consists of 135,000 words (166,000 tokens).  The lemmatization, the morphologic and syntactic annotation were performed manually.+The FicTree treebank is a syntactically annotated corpus of Czech fiction. It consists of 135,000 words (166,000 tokens).  The lemmatization, the morphological and syntactic annotation were performed manually.
 <WRAP right 35%> <WRAP right 35%>
 ^ <fs medium>Name</fs> ^^ <fs medium>FicTree</fs> ^ ^ <fs medium>Name</fs> ^^ <fs medium>FicTree</fs> ^
Line 50: Line 50:
 ===== 3. Data annotated in the Universal Dependencies standard ===== ===== 3. Data annotated in the Universal Dependencies standard =====
  
-The morphological and syntactic annotation according to the UD guidelines was assigned by converting the original PDT annotation. The conversion procedure was designed by Dan Zeman and implemented in [[https://github.com/ufal/treex|Treex]].+The morphological and syntactic annotation according to the UD guidelines was performed by converting the original PDT annotation. The conversion procedure was designed by Dan Zeman and implemented in [[https://github.com/ufal/treex|Treex]].
 The data are available on the [[http://universaldependencies.org/treebanks/cs_fictree/index.html|Universal Dependencies]] webpage. They are in the [[http://universaldependencies.org/format.html|CONLL-U format]]. The original texts are divided into segments of maximum 100 tokens, the segments are shuffled and divided into a train, val and test data set. The FicTree treebank in UD standard is also accessible using the query tool  [[https://lindat.mff.cuni.cz/services/pmltq/|PML-TQ]]. The data are available on the [[http://universaldependencies.org/treebanks/cs_fictree/index.html|Universal Dependencies]] webpage. They are in the [[http://universaldependencies.org/format.html|CONLL-U format]]. The original texts are divided into segments of maximum 100 tokens, the segments are shuffled and divided into a train, val and test data set. The FicTree treebank in UD standard is also accessible using the query tool  [[https://lindat.mff.cuni.cz/services/pmltq/|PML-TQ]].
  
 ===== Acknowledgments ===== ===== Acknowledgments =====
-We wish to thank the participants in the annotation effort: Ivana Klímová, Alena Kropíková and Olga Zitová; as well as Dan Zeman for the data conversion.+We wish to thank the human annotators: Ivana Klímová, Alena Kropíková and Olga Zitová; as well as Dan Zeman for the data conversion.
  
 ===== How to cite FicTree ===== ===== How to cite FicTree =====
 <WRAP round tip 70%> <WRAP round tip 70%>
-Jelínek, T. – Hnátková, M. – Skoumalová, H.: FicTree: manuálně syntakticky anotovaný korpus české beletrie. Ústav Českého národního korpusu FF UK, Praha 2017. Dostupný z WWW: http://www.korpus.cz+Jelínek, T. – Hnátková, M. – Skoumalová, H.: //FicTree: manuálně syntakticky anotovaný korpus české beletrie//. Ústav Českého národního korpusu FF UK, Praha 2017. Dostupný z WWW: http://www.korpus.cz
  
-Jelínek, T.: //FicTree: a Manually Annotated Treebank of Czech Fiction.// In: J. Hlaváčová (Ed.): ITAT 2017 Proceedings, pp. 181–185. http://ceur-ws.org/Vol-1885/181.pdf+Jelínek, T.: FicTree: a Manually Annotated Treebank of Czech Fiction. In: J. Hlaváčová (Ed.): //ITAT 2017 Proceedings//, pp. 181–185. http://ceur-ws.org/Vol-1885/181.pdf
 </WRAP> </WRAP>