Layer-Targeted Multilingual Knowledge Erasure in Large Language Models
Taoran Li, Varun Chandrasekaran, Zhiyuan Yu

TL;DR
This paper investigates how to effectively erase specific multilingual knowledge from large language models by identifying optimal intervention layers, proposing a targeted unlearning framework that ensures knowledge removal while preserving multilingual capabilities.
Contribution
It introduces MUTE, a novel framework that uses layer analysis to enable robust multilingual knowledge erasure in LLMs, addressing limitations of previous methods.
Findings
Intervention depth determines success of multilingual unlearning.
Shallow layers allow erasure but harm multilingual capabilities.
Deep layers preserve utility but fail to erase knowledge.
Abstract
Recent work has demonstrated that machine unlearning in Large Language Models (LLMs) fails to generalize across languages: knowledge erased in one language frequently remains accessible through others. However, the underlying cause of this failure and a principled solution remain open. In this work, we identify intervention depth as the key factor determining multilingual generalization. Through systematic layer-wise experiments, we characterize two distinct failure modes: shallow-layer interventions achieve erasure but collapse multilingual capabilities in held-out languages, while deep-layer interventions preserve utility but fail to erase target knowledge even in source languages. These findings reveal that the choice of intervention layer is not a free parameter; it fundamentally determines whether multilingual unlearning succeeds. We propose MUTE (Multilingual Unlearning via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
