Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter
Zeguan Xiao, Xuanzhe Xu, Yun Chen, Yong Wang, Jian Yang, Yanqing Hu, Guanhua Chen

TL;DR
This paper investigates the vulnerability of LLM unlearning to relearning attacks, revealing the importance of minor components in representations and proposing a new method targeting them for improved robustness.
Contribution
The paper uncovers the role of minor components in representation geometry for unlearning robustness and introduces Minor Component Unlearning (MCU), a novel approach targeting these components.
Findings
Existing unlearning methods mainly optimize dominant components.
Minor components are more resistant to relearning attacks.
MCU significantly improves resistance to relearning attacks.
Abstract
Large language model (LLM) unlearning aims to remove specific data influences from pre-trained model without costly retraining, addressing privacy, copyright, and safety concerns. However, recent studies reveal a critical vulnerability: unlearned models rapidly recover "forgotten" knowledge through relearning attacks. This fragility raises serious security concerns, especially for open-weight models. In this work, we investigate the fundamental mechanism underlying this fragility from a representation geometry perspective. We discover that existing unlearning methods predominantly optimize along dominant components, leaving minor components largely unchanged. Critically, during relearning attacks, the modifications in these dominant components are easily reversed, enabling rapid knowledge recovery, whereas minor components exhibit stronger resistance to such reversal. We further provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
