Marrying Universal Dependencies and Universal Morphology
Arya D. McCarthy, Miikka Silfverberg, Ryan Cotterell, Mans Hulden,, David Yarowsky

TL;DR
This paper introduces a deterministic mapping from Universal Dependencies to UniMorph, enabling interoperability between the two schemas and enhancing their utility for linguistic tasks, while critically evaluating their respective foundations.
Contribution
The paper provides the first deterministic mapping from UD features to UniMorph schema, facilitating cross-resource validation and combined use for linguistic applications.
Findings
Achieved 64.13% macro-average recall in mapping validation
Identified data scarcity issues affecting compatibility
Provided a critical evaluation of UD and UniMorph schemas
Abstract
The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language. Each project also provides corpora of annotated text in many languages - UD at the token level and UniMorph at the type level. As each corpus is built by different annotators, language-specific decisions hinder the goal of universal schemata. With compatibility of tags, each project's annotations could be used to validate the other's. Additionally, the availability of both type- and token-level resources would be a boon to tasks such as parsing and homograph disambiguation. To ease this interoperability, we present a deterministic mapping from Universal Dependencies v2 features into the UniMorph schema. We validate our approach by lookup in the UniMorph corpora and find a macro-average of 64.13% recall. We also note incompatibilities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
