Shared Path: Unraveling Memorization in Multilingual LLMs through Language Similarities
Xiaoyu Luo, Yiyi Chen, Johannes Bjerva, Qiongxiu Li

TL;DR
This paper investigates memorization in multilingual large language models, revealing that language similarities influence memorization patterns and highlighting the importance of a language-aware approach for understanding and mitigating memorization vulnerabilities.
Contribution
It introduces a novel graph-based metric incorporating language similarity to analyze cross-lingual memorization in MLLMs, providing new insights into multilingual memorization behaviors.
Findings
Similar languages with fewer training tokens tend to memorize more.
Language similarity explains memorization patterns across languages.
Cross-lingual transferability is linked to language similarity.
Abstract
We present the first comprehensive study of Memorization in Multilingual Large Language Models (MLLMs), analyzing 95 languages using models across diverse model scales, architectures, and memorization definitions. As MLLMs are increasingly deployed, understanding their memorization behavior has become critical. Yet prior work has focused primarily on monolingual models, leaving multilingual memorization underexplored, despite the inherently long-tailed nature of training corpora. We find that the prevailing assumption, that memorization is highly correlated with training data availability, fails to fully explain memorization patterns in MLLMs. We hypothesize that the conventional focus on monolingual settings, effectively treating languages in isolation, may obscure the true patterns of memorization. To address this, we propose a novel graph-based correlation metric that incorporates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Library Science and Information Systems · Translation Studies and Practices
