Pretrained Multilingual Transformers Reveal Quantitative Distance Between Human Languages
Yue Zhao, Jiatao Gu, Paloma Jereti\v{c}, Weijie Su

TL;DR
This paper introduces a novel method using pretrained multilingual transformers to quantitatively measure linguistic distances between languages, aligning with linguistic and geographic relationships and improving translation performance.
Contribution
The paper presents Attention Transport Distance (ATD), a new, scalable, and tokenization-agnostic metric derived from attention mechanisms in multilingual models for linguistic measurement.
Findings
ATD accurately recovers linguistic groupings
ATD reveals geographic and contact-induced language patterns
Using ATD as a regularizer enhances low-resource translation
Abstract
Understanding the distance between human languages is central to linguistics, anthropology, and tracing human evolutionary history. Yet, while linguistics has long provided rich qualitative accounts of cross-linguistic variation, a unified and scalable quantitative approach to measuring language distance remains lacking. In this paper, we introduce a method that leverages pretrained multilingual language models as systematic instruments for linguistic measurement. Specifically, we show that the spontaneously emerged attention mechanisms of these models provide a robust, tokenization-agnostic measure of cross-linguistic distance, termed Attention Transport Distance (ATD). By treating attention matrices as probability distributions and measuring their geometric divergence via optimal transport, we quantify the representational distance between languages during translation. Applying ATD to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Natural Language Processing Techniques · Authorship Attribution and Profiling
