Modality Matching Matters: Calibrating Language Distances for Cross-Lingual Transfer in URIEL+
York Hay Ng, Aditya Khan, Xiang Lu, Matteo Salloum, Michael Zhou, Phuong H. Hoang, A. Seza Do\u{g}ru\"oz, En-Shiun Annie Lee

TL;DR
This paper introduces a new framework for type-matched language distances that improves cross-lingual transfer by using structure-aware representations and a composite distance measure, addressing limitations of existing knowledge bases.
Contribution
It proposes novel, structure-aware representations for geographic, genetic, and typological distances and unifies them into a robust, task-agnostic composite distance for better cross-lingual transfer.
Findings
Significantly improves transfer performance when the distance type is relevant.
Yields gains in most zero-shot transfer benchmarks.
Enhances the utility of linguistic knowledge bases for cross-lingual tasks.
Abstract
Existing linguistic knowledge bases such as URIEL+ provide valuable geographic, genetic and typological distances for cross-lingual transfer but suffer from two key limitations. First, their one-size-fits-all vector representations are ill-suited to the diverse structures of linguistic data. Second, they lack a principled method for aggregating these signals into a single, comprehensive score. In this paper, we address these gaps by introducing a framework for type-matched language distances. We propose novel, structure-aware representations for each distance type: speaker-weighted distributions for geography, hyperbolic embeddings for genealogy, and a latent variables model for typology. We unify these signals into a robust, task-agnostic composite distance. Across multiple zero-shot transfer benchmarks, we demonstrate that our representations significantly improve transfer performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLanguage and cultural evolution · Natural Language Processing Techniques · Speech Recognition and Synthesis
