Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:cnk:fictree [2017/11/14 09:22] – tomasjelinek | en:cnk:fictree [2017/12/18 19:24] (current) – [How to cite FicTree] michalkren | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== FicTree | + | ===== FicTree: a manually annotated |
+ | The FicTree treebank is a syntactically annotated corpus of Czech fiction. It consists of 135,000 words (166,000 tokens). | ||
<WRAP right 35%> | <WRAP right 35%> | ||
^ <fs medium> | ^ <fs medium> | ||
Line 11: | Line 11: | ||
^ ::: ^ Publication date | 2017 | | ^ ::: ^ Publication date | 2017 | | ||
</ | </ | ||
- | FicTree is a syntactically annotated corpus | + | ===== The composition |
- | The texts come from eight literary works published in the Czech Republic between 1991 and 2007. | + | |
- | The text data was manually annotated according to the [[https:// | + | |
- | guidelines]] (annotation on the analytical layer). | + | |
- | To comply with agreements concluded with the copyright holders, the texts are shuffled into random chunks of maximum 100 words (respecting sentence boundaries). | + | |
- | === Annotation procedure === | + | The FicTree treebank consists of eight literary works published in the Czech Republic between 1991 and 2007. The texts in the treebank include six fiction titles, a children’s fiction |
- | The texts were parsed independently by two parsers trained on the Prague Dependency Treebank data (analytical layer). The parsing results were manually | + | Most of the texts were first published between 1991 and 2007 except for one text, published in 1969. |
- | corrected and the two versions merged. Any differences | + | Five texts (80% of all tokens) are original Czech texts, the other three are translations (from German and Slovak). |
- | === Text details | + | ===== The syntactic annotation of the treebank ===== |
- | The eight texts in the treebank | + | The FicTree |
- | Most of the texts were first published between 1991 and 2007 except one text, published in 1969. | + | |
- | 80% of the texts are original Czech texts, 20% are translations (from German and Slovak). | + | |
- | === References | + | ===== Access to the treebank ===== |
- | Tomáš Jelínek, 2017. //FicTree: a Manually Annotated Treebank of Czech Fiction// | + | |
- | In: J. Hlaváčová (Ed.): ITAT 2017 Proceedings, | + | |
- | http:// | + | |
- | === Acknowledgments === | + | The FicTree treebank can be accessed in several ways: |
+ | - [[en: | ||
+ | - [[en: | ||
+ | - [[en: | ||
- | We wish to thank the participants | + | ===== 1. A CNC corpus |
- | Olga Zitová. | + | |
+ | The FicTree corpus is available in the same way as other CNC corpora through the [[en: | ||
+ | The corpus annotation is accessible through a wide range of attributes for each token. The morphological annotation and lemmatization are available using the attributes [[seznamy: | ||
+ | |||
+ | The syntactic annotation of FicTree can be accessed using several positional attributes (the same as in the SYN2015 corpus): | ||
+ | * afun – syntactic function according to the a-layer PDT annotation | ||
+ | * parent – relative position of the governing token | ||
+ | * eparent – relative position of the nearest governing content word | ||
+ | * prep – lemma of a preposition governing the token (if any) | ||
+ | * p_lemma, p_tag, ep_lemma, ep_tag – tag and lemma of the governing token | ||
+ | * p_pos, p_case, ep_pos, ep_case – POS and case of the governing token | ||
+ | * p_afun, ep_afun – syntactic function of the governing token | ||
+ | |||
+ | ===== 2. Data annotated according to PDT a-layer ===== | ||
+ | |||
+ | The data of the FicTree treebank, annotated according to the PDT a-layer guidelines, are available through the | ||
+ | [[https:// | ||
+ | |||
+ | ===== 3. Data annotated in the Universal Dependencies standard ===== | ||
+ | |||
+ | The morphological and syntactic annotation according to the UD guidelines was performed by converting the original PDT annotation. The conversion procedure was designed by Dan Zeman and implemented in [[https:// | ||
+ | The data are available on the [[http:// | ||
+ | |||
+ | ===== Acknowledgments ===== | ||
+ | We wish to thank the human annotators: Ivana Klímová, Alena Kropíková and Olga Zitová; as well as Dan Zeman for the data conversion. | ||
+ | |||
+ | ===== How to cite FicTree ===== | ||
+ | <WRAP round tip 70%> | ||
+ | Jelínek, T. – Hnátková, M. – Skoumalová, | ||
+ | |||
+ | Jelínek, T.: FicTree: a Manually Annotated Treebank of Czech Fiction. In: J. Hlaváčová (Ed.): //ITAT 2017 Proceedings//, | ||
+ | </ | ||