Multilingual Amnesia: On the Transferability of Unlearning in Multilingual LLMs
Alireza Dehghanpour Farashah, Aditi Khandelwal, Marylou Fauchard, Zhuan Shi, Negar Rostamzadeh, and Golnoosh Farnadi

TL;DR
This paper investigates the transferability of unlearning in multilingual large language models, focusing on data and concept unlearning across ten diverse languages, revealing insights into stability and cross-lingual effects.
Contribution
It extends unlearning benchmarks to ten languages and analyzes how linguistic similarities influence unlearning transfer in multilingual LLMs.
Findings
Unlearning is more stable in high-resource languages.
Asymmetric transfer effects occur between related languages.
Syntactic similarity predicts cross-lingual unlearning behavior.
Abstract
As multilingual large language models become more widely used, ensuring their safety and fairness across diverse linguistic contexts presents unique challenges. While existing research on machine unlearning has primarily focused on monolingual settings, typically English, multilingual environments introduce additional complexities due to cross-lingual knowledge transfer and biases embedded in both pretraining and fine-tuning data. In this work, we study multilingual unlearning using the Aya-Expanse 8B model under two settings: (1) data unlearning and (2) concept unlearning. We extend benchmarks for factual knowledge and stereotypes to ten languages through translation: English, French, Arabic, Japanese, Russian, Farsi, Korean, Hindi, Hebrew, and Indonesian. These languages span five language families and a wide range of resource levels. Our experiments show that unlearning in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Natural Language Processing Techniques
