Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs

Kyomin Hwang; Hyeonjin Kim; Seungyeon Kim; Sunghyun Wee; Nojun Kwak

arXiv:2510.23949·cs.CL·October 29, 2025

Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs

Kyomin Hwang, Hyeonjin Kim, Seungyeon Kim, Sunghyun Wee, Nojun Kwak

PDF

TL;DR

This paper investigates the risks of language confusion in multilingual LLMs during unlearning, revealing limitations of current metrics and proposing new evaluation methods to better assess unlearning effectiveness.

Contribution

It introduces the N-gram-based Language-Mix score to quantify language confusion and advocates for semantic-based metrics for more accurate unlearning evaluation.

Findings

01

Language confusion is widespread in multilingual LLMs after fine-tuning and unlearning.

02

Reference-based metrics often produce false negatives in high language confusion scenarios.

03

Semantic-based metrics are necessary for accurate assessment of unlearning in multilingual models.

Abstract

There have been a couple of studies showing that attempting to erase multilingual knowledge using only English data is insufficient for multilingual LLMs. However, their analyses remain highly performance-oriented. In this paper, we switch the point of view to evaluation, and address an additional blind spot which reveals itself when the multilingual LLM is fully finetuned with parallel multilingual dataset before unlearning. Here, language confusion occurs whereby a model responds in language different from that of the input prompt. Language confusion is a problematic phenomenon in unlearning, causing the standard reference-based metrics to fail. We tackle this phenomenon in three steps: (1) introduce N-gram-based Language-Mix (N-Mix) score to quantitatively show the language confusion is pervasive and consistent in multilingual LLMs, (2) demonstrate that reference-based metrics result…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.