Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:cnk:fictree [2017/12/15 15:17] – [The syntactic annotation of the treebank] tomasjelinek | en:cnk:fictree [2017/12/18 19:24] (current) – [How to cite FicTree] michalkren | ||
---|---|---|---|
Line 1: | Line 1: | ||
===== FicTree: a manually annotated treebank of Czech fiction ===== | ===== FicTree: a manually annotated treebank of Czech fiction ===== | ||
- | The FicTree treebank is a syntactically annotated corpus of Czech fiction. It consists of 135,000 words (166,000 tokens). | + | The FicTree treebank is a syntactically annotated corpus of Czech fiction. It consists of 135,000 words (166,000 tokens). |
<WRAP right 35%> | <WRAP right 35%> | ||
^ <fs medium> | ^ <fs medium> | ||
Line 14: | Line 14: | ||
The FicTree treebank consists of eight literary works published in the Czech Republic between 1991 and 2007. The texts in the treebank include six fiction titles, a children’s fiction | The FicTree treebank consists of eight literary works published in the Czech Republic between 1991 and 2007. The texts in the treebank include six fiction titles, a children’s fiction | ||
- | Most of the texts were first published between 1991 and 2007 except one text, published in 1969. | + | Most of the texts were first published between 1991 and 2007 except |
Five texts (80% of all tokens) are original Czech texts, the other three are translations (from German and Slovak). | Five texts (80% of all tokens) are original Czech texts, the other three are translations (from German and Slovak). | ||
Line 25: | Line 25: | ||
The FicTree treebank can be accessed in several ways: | The FicTree treebank can be accessed in several ways: | ||
- [[en: | - [[en: | ||
- | - [[en: | + | - [[en: |
- | - [[en: | + | - [[en: |
===== 1. A CNC corpus in the KonText interface ===== | ===== 1. A CNC corpus in the KonText interface ===== | ||
Line 32: | Line 32: | ||
The FicTree corpus is available in the same way as other CNC corpora through the [[en: | The FicTree corpus is available in the same way as other CNC corpora through the [[en: | ||
- | The corpus annotation is accessible through a wide range of attributes | + | The corpus annotation is accessible through a wide range of attributes |
- | The syntactic annotation of FicTree can be accessed using several positional attributes (the same as in the corpus | + | The syntactic annotation of FicTree can be accessed using several positional attributes (the same as in the SYN2015 |
* afun – syntactic function according to the a-layer PDT annotation | * afun – syntactic function according to the a-layer PDT annotation | ||
* parent – relative position of the governing token | * parent – relative position of the governing token | ||
* eparent – relative position of the nearest governing content word | * eparent – relative position of the nearest governing content word | ||
* prep – lemma of a preposition governing the token (if any) | * prep – lemma of a preposition governing the token (if any) | ||
- | * p_lemma, p_tag, ep_lemma, ep_tag – tag a lemma of the governing token | + | * p_lemma, p_tag, ep_lemma, ep_tag – tag and lemma of the governing token |
* p_pos, p_case, ep_pos, ep_case – POS and case of the governing token | * p_pos, p_case, ep_pos, ep_case – POS and case of the governing token | ||
* p_afun, ep_afun – syntactic function of the governing token | * p_afun, ep_afun – syntactic function of the governing token | ||
Line 50: | Line 50: | ||
===== 3. Data annotated in the Universal Dependencies standard ===== | ===== 3. Data annotated in the Universal Dependencies standard ===== | ||
- | The morphological and syntactic annotation according to the UD guidelines was assigned | + | The morphological and syntactic annotation according to the UD guidelines was performed |
The data are available on the [[http:// | The data are available on the [[http:// | ||
===== Acknowledgments ===== | ===== Acknowledgments ===== | ||
- | We wish to thank the participants in the annotation effort: Ivana Klímová, Alena Kropíková and Olga Zitová; as well as Dan Zeman for the data conversion. | + | We wish to thank the human annotators: Ivana Klímová, Alena Kropíková and Olga Zitová; as well as Dan Zeman for the data conversion. |
===== How to cite FicTree ===== | ===== How to cite FicTree ===== | ||
<WRAP round tip 70%> | <WRAP round tip 70%> | ||
- | Jelínek, T. – Hnátková, M. – Skoumalová, | + | Jelínek, T. – Hnátková, M. – Skoumalová, |
- | Jelínek, T.: //FicTree: a Manually Annotated Treebank of Czech Fiction.// In: J. Hlaváčová (Ed.): ITAT 2017 Proceedings, | + | Jelínek, T.: FicTree: a Manually Annotated Treebank of Czech Fiction. In: J. Hlaváčová (Ed.): |
</ | </ | ||