UniMorph 4.0: Universal Morphology
Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash,, Witold Kiera\'s, G\'abor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman,, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya,, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser

TL;DR
UniMorph 4.0 expands a multilingual morphological database with 67 new languages, schema improvements, morpheme segmentation, and derivational morphology, enhancing its coverage and linguistic inclusiveness.
Contribution
The paper introduces significant updates to UniMorph, including new languages, schema enhancements, morpheme segmentation, and derivational morphology integration.
Findings
Added 67 new languages, including 30 endangered ones
Improved extraction pipeline for missing morphological features
Augmented database with morpheme segmentation and derivational data
Abstract
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
