URIEL+: Enhancing Linguistic Inclusion and Usability in a Typological and Multilingual Knowledge Base
Aditya Khan, Mason Shipton, David Anugraha, Kaiyao Duan, Phuong H., Hoang, Eric Khiu, A. Seza Do\u{g}ru\"oz, En-Shiun Annie Lee

TL;DR
URIEL+ is an improved multilingual knowledge base that expands linguistic coverage, enhances usability, and provides more accurate distance measures, supporting better linguistic research and applications.
Contribution
This paper introduces URIEL+, an enhanced version of URIEL that expands typological features, improves user experience, and offers more accurate, customizable distance calculations.
Findings
Expanded typological features for 2898 languages.
Improved distance calculations aligning with linguistic studies.
Competitive performance on downstream linguistic tasks.
Abstract
URIEL is a knowledge base offering geographical, phylogenetic, and typological vector representations for 7970 languages. It includes distance measures between these vectors for 4005 languages, which are accessible via the lang2vec tool. Despite being frequently cited, URIEL is limited in terms of linguistic inclusion and overall usability. To tackle these challenges, we introduce URIEL+, an enhanced version of URIEL and lang2vec that addresses these limitations. In addition to expanding typological feature coverage for 2898 languages, URIEL+ improves the user experience with robust, customizable distance calculations to better suit the needs of users. These upgrades also offer competitive performance on downstream tasks and provide distances that better align with linguistic distance studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsBalanced Selection · ALIGN
