Obsah

Universal Dependencies – UD

Universal Dependencies is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 300 contributors producing nearly 200 treebanks in over 100 languages. A recent version of the InterCorp parallel corpus (13ud) has been annotated in terms of morphological categories, syntactic functions and syntactic structure following the UD guidelines and using the tools developed within the UD project.

General guidelines for annotation are provided on the UD project website (UD Guidelines), including a detailed description of:

Key specifics of the UD annotation as used in InterCorp:

Morphological annotation

Parts of speech

Other categories

category gloss example values
Abbr abbreviation Yes
Animacy animacy Anim, Inan, Hum, Nhum
Aspect aspect Imp, Perf, Hab, Iter, Prog, Prosp
Case case Nom, Gen, Dat, Acc, Voc, Loc, Ins, …
Definite definiteness Ind, Def, …
Degree degree Pos, Cmp, Sup, Equ, Abs
Foreign foreign word Yes
Gender gender Fem, Masc, Neut, Com
Mood mood Ind, Imp, Cnd, …
NumType numeral type Card, Ord, Mult, Frac, Sets, …
Number number Sing, Plur, Dual, Ptan, Coll, …
Person person 1, 2, 3, …
Polarity polarity Neg, Pos
Polite politeness Infm, Form, Elev, Humb
Poss possessiveness Yes
PronType type of pronoun etc. Prs, Rcp, Art, Int, Rel, Exc, Dem, Emp, Tot, Ind
Reflex reflexiveness Yes
Tense tense Pres, Past, Fut, Pqp, Imp
Typo typo Yes
VerbForm verb form Fin, Inf, Part, Conv, Ger, Vnoun, Sup
Voice voice Act, Pass, Mid, Cau, …

Multi-part tokens

Syntactic annotation

Syntactic functions

deprel gloss example4)
acl adnominal clause, finite or non-finite The convent of the Poor Clares, known as the Minories, was destroyed to make way for storehouses.
acl:relcl relative adnominal clause London has always been a vast ocean in which survival is not certain.
advcl adverbial clause The country will pay a heavy price if the president’s obsessions prevail for long.
advmod adverbial modifier They were all corrupt opportunists. Gorshkov knew where that idea came from .
amod adjectival modifier The sustainable future of humanity is at stake.
appos apposition They were going to a new home, a house of her choosing.
aux auxiliary verb We have made our voice heard by the world. It's going to work. You can't start improvising now.
aux:pass passive auxiliary Men like that are born only once. Who else should I get dressed up for if not her?
case case marking (incl. preposition) Karpov's own career might hang in the balance
cc coordinating conjunction I now invite you all to eat, drink, and make yourselves at home!
cc:preconj preconjunct They are poisoning both the water and the soil.
ccomp clausal complement I doubt whether the new model is an improvement.
clf classifier 学生 sān xuéshēng
compound compound In Gondor ten thousand years would not suffice.
compound:prt phrasal verb particle He laid out the city’s streets and rebuilt its walls.
conj non-initial conjunct You have two parents and you always will have.
cop copula Where's the rest of your luggage?
csubj clausal subject, finite or nonfinite It's quite easy to clear up these contradictions. But the most important thing is you shouldn't lose too much time.
csubj:pass clausal subject of passive clause Taking notes has been banned.
dep unspecified dependency By the 1860s, the South was utterly flush with cash. My dad doesn't really not that good.
det determiner What way they went I don’t know and no rabbit knows .
det:predet predeterminer People get sick all the time.
discourse discourse element Yes, please,’ said Ron. Oh dear, what a bore!
dislocated dislocated elements Dumplings I like.
expl expletive There is a ghost in the room.
fixed non-initial parts of fixed multiword unit At least there's one of you brave enough! Of course there may be exceptions.
flat non-initial parts of flat multiword unit Let's go to San Francisco. What was Miss O'Hara up to?
flat:foreign non-initial parts of flat multiword unit During the colonial period it was called the Portal de los Mercaderes .
goeswith non-initial parts of incorrectly split form They come here with out legal permission.
iobj indirect object He brought us eggs. Can I buy you a drink?
list non-initial parts of list Steve Jones tel.: 555-9814 e-mail: jones@abc.edf
mark marker I spent the night telling jokes to keep Petrik from falling asleep at the wheel. I just want to know what you are thinking about when you wake up.
nmod nominal modifier Did they put some fish near the infant's grave for his journey into the afterlife ?
nmod:npmod noun phrase as adverbial modifier He was younger then and a lot more agile. It seemed that everyone had trembling hands and tear-filled eyes.
nmod:poss possessive nominal modifier Many saw it as a good thing that her show was taken off the air.
nmod:tmod temporal modifier In Plenary today I supported the amendment.
nsubj nominal subject Those who venture upon its currents look for prosperity or fame, even if they often founder in its depths.
nsubj:pass nominal subject of passive clause The horses were adorned with just one red scarf.
nummod numeric modifier Dissolution does but give birth to fresh modes of organization, and one death is the parent of a thousand lives.
obj object But who can stop the people? What do you mean? I don't know what to do.
obl oblique nominal We might bring an avalanche down on ourselves for no good reason .
obl:npmod noun phrase as oblique nominal I get fed up a little sometimes.
obl:tmod temporal modifier I leave tomorrow. Tell him everything, tonight.
orphan orphan after elided head Mary won gold and Peter bronze.
parataxis parataxis (incl. parentheticals) „Is that the only reason?“ she asked, putting her eyes close to mine.
punct punctuation Is that all?
reparandum overridden disfluency Go to the right- to the left.
root root This was not a good moment in the history of English cuisine.
vocative vocative See you later, Sam.
xcomp open clausal complement Maria saw me standing at the mirror.

References to syntactic heads

References to function words

Coordination

UD and KonText

Basic query

Query for a lemma and a morphological tag

Query for a part of speech and morphological categories using the menu

Query for a syntactic function

Query results

Formatted text

Syntactic structure display

Examples of queries

The queries mainly show the possibilities of using syntactic functions in connection with parts of speech and morphological categories, but also include references to syntactis heads and dependent auxiliaries. Most of the queries concern English, but they are also applicable to other languages, although the specific language may require some modifications to the query. Queries can be entered in one language, or in two or more languages in parallel.

Who are the most likely singers

[deprel="nsubj" & p_lemma="sing"]

What birds do most often

[deprel="nsubj" & lemma="bird"]

Nouns following a specific preposition

[case_lemma="about" & case="Acc"]

Verbs taking an indirect object

[deprel="iobj"]

Direct or indirect objects, also as conjuncts

[deprel="i?obj" | deprel="conj" & p_deprel="i?obj"]

Proper nouns as subjects, also as conjuncts

[deprel="nsubj" & upos="PROPN" | deprel="conj" & p_deprel="nsubj" & upos="PROPN"]

Gerunds preceded by "with" as the marker

[verb_form="Ger" & mark_lemma="with"]

Verbs of sensing followed by an object and an infinitive

1:[lemma="feel|sense|perceive"] []* 2:[deprel="obj"] []* 3:[verb_form="Inf" & deprel="xcomp"] & 2.head=1.id & 3.head=1.id within <s/>

Past conditional passive in Czech

[voice="Pass" & aux_feats="Mood=Cnd" & aux_feats=".*Tense=Past.*Tense=Past.*"]

Past conditional passive in English

[feats="VerbForm=Part" & aux_feats=".*Tense=Past.*VerbForm=Inf.*Tense=Past.*"]

Continuous perfect

[feats="VerbForm=Ger" & aux_feats="VerbForm=Fin" & aux_feats="VerbForm=Part"]

Passive of first person singular continuous

[aux_lemma="be" & aux_feats="Person=1" & aux_feats="Number=Sing" & aux_feats="VerbForm=Ger" & feats="VerbForm=Part"]

Past perfect

[feats="VerbForm=Part" & aux_lemma="have" & aux_lemma!="be|will|can|may|must" & aux_feats="Mood=Ind" & aux_feats="Tense=Past"]

Description of the list of attributes

Basic attributes

Structural attributes

Function word attributes

Attributes representing selected categories

Errors and shortcomings of linguistic annotation according to UD

The quality of annotations in different languages differs mainly in the volume and quality of training data. It is also affected by the method and tool used for annotation.

We will be grateful for every reported error, discrepancy, deficiency, comment and suggestion at the address CNC user support. Please include the abbreviation „UD“ at the beginning of the message subject.

References

Selection of literature about UD

Marie-Catherine de Marneffe, Christopher Manning, Joakim Nivre, Daniel Zeman (2021): Universal Dependencies. In: Computational Linguistics, ISSN 1530-9312, vol. 47, no. 2, pp. 255-308.

Daniel Zeman (2018): The World of Tokens, Tags and Trees. ISBN 978-80-88132-09-7.

For a complete list, see here.

Tutorials and lectures about UD

Daniel Zeman: Universal Dependencies and the Slavic Languages. Warsaw, 19.11.2018.

Joakim Nivre, Daniel Zeman, Filip Ginter, Francis M. Tyers: Tutorial on Universal Dependencies: Adding a new language to UD

Anna Nedoluzhko, Michal Novak, Martin Popel, Zdenek Zabokrtsky and Daniel Zeman: Coreference meets Universal Dependencies. Prague, 19/04/2021.

Daniel Zeman: Reflexives in Universal Dependencies. Prague, 04/03/2019.

1)
Note that for technical reasons the names of the categorial attributes are all in lower case, including names such as VerbForm (in feats), rendered as verb_form, or NumType, rendered as num_type. The attribute values, such as Fem, retain the initial upper case character, but are enclosed in double quotes, like other attribute values outside feats.
2)
Aggregates are present in the following languages: ar, ca, cs, de, el, en, es, fi, fr, he, it, pl, pt, tr and uk. A list of all aggregates for a given language is displayed as the frequency distribution of word forms following the query [sword = ".|.+"].
3)
The first form, preceding the dash, is the original form, i.e. the value of the iword attribute, the second form, after the dash, is the reconstructed form, i.e. the value of the sword attribute. If a parenthesis includes just one form, the two options are identical, or the given language does not provide reconstructed forms.
4)
The constituent performing the given function is highlighted. If the constituent consists of more than one word, the constituent's governor (head word) is underlined. It is this token which is annotated by the given function.
5)
In a basic query, it is no longer necessary in some languages to separate parts of the aggregate with a space, eg był, by, and m of the Polish agglutinated form byłbym or is and n't of the English contraction isn't, even in a longer expression (aren't I). However, a basic query for is or n't will not show concordances including the form isn't.
6)
In the current release, only very few occurrences of personal pronouns in the subject position (such as she) are annotated as case="Nom".
7)
The automatic annotation of these sentences is not 100% consistent. Sometimes with is tagged as a preposition and attached to the subject of the non-finite clause, as in She ran off, with the toy spider scuttling obediently after her.