Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs
Kyomin Hwang, Hyeonjin Kim, Seungyeon Kim, Sunghyun Wee, Nojun Kwak

TL;DR
This paper investigates the risks of language confusion in multilingual LLMs during unlearning, revealing limitations of current metrics and proposing new evaluation methods to better assess unlearning effectiveness.
Contribution
It introduces the N-gram-based Language-Mix score to quantify language confusion and advocates for semantic-based metrics for more accurate unlearning evaluation.
Findings
Language confusion is widespread in multilingual LLMs after fine-tuning and unlearning.
Reference-based metrics often produce false negatives in high language confusion scenarios.
Semantic-based metrics are necessary for accurate assessment of unlearning in multilingual models.
Abstract
There have been a couple of studies showing that attempting to erase multilingual knowledge using only English data is insufficient for multilingual LLMs. However, their analyses remain highly performance-oriented. In this paper, we switch the point of view to evaluation, and address an additional blind spot which reveals itself when the multilingual LLM is fully finetuned with parallel multilingual dataset before unlearning. Here, language confusion occurs whereby a model responds in language different from that of the input prompt. Language confusion is a problematic phenomenon in unlearning, causing the standard reference-based metrics to fail. We tackle this phenomenon in three steps: (1) introduce N-gram-based Language-Mix (N-Mix) score to quantitatively show the language confusion is pervasive and consistent in multilingual LLMs, (2) demonstrate that reference-based metrics result…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
