Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy

Jie Ren; Zhenwei Dai; Xianfeng Tang; Yue Xing; Shenglai Zeng; Hui Liu; Jingying Zeng; Qiankun Peng; Samarth Varshney; Suhang Wang; Qi He; Charu C. Aggarwal; Hui Liu

arXiv:2506.00359·cs.CR·June 3, 2025

Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy

Jie Ren, Zhenwei Dai, Xianfeng Tang, Yue Xing, Shenglai Zeng, Hui Liu, Jingying Zeng, Qiankun Peng, Samarth Varshney, Suhang Wang, Qi He, Charu C. Aggarwal, Hui Liu

PDF

Open Access

TL;DR

This paper uncovers a vulnerability in fine-tuning-based unlearning of LLMs, where malicious manipulation can degrade model utility, and proposes a scope-aware method to mitigate this risk effectively.

Contribution

The paper identifies a critical security flaw in existing unlearning techniques and introduces Scope-aware Unlearning, a novel method to localize unlearning effects and enhance robustness.

Findings

01

Stealthy Attack can induce unlearning behaviors with benign tokens

02

Scope-aware Unlearning effectively reduces utility degradation

03

Proposed method seamlessly integrates with existing frameworks

Abstract

Although Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of tasks, growing concerns have emerged over the misuse of sensitive, copyrighted, or harmful data during training. To address these concerns, unlearning techniques have been developed to remove the influence of specific data without retraining from scratch. However, this paper reveals a critical vulnerability in fine-tuning-based unlearning: a malicious user can craft a manipulated forgetting request that stealthily degrades the model's utility for benign users. We demonstrate this risk through a red-teaming Stealthy Attack (SA), which is inspired by two key limitations of existing unlearning (the inability to constrain the scope of unlearning effect and the failure to distinguish benign tokens from unlearning signals). Prior work has shown that unlearned models tend to memorize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw, AI, and Intellectual Property · Corporate Insolvency and Governance · Private Equity and Venture Capital