Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

Zeguan Xiao; Xuanzhe Xu; Yun Chen; Yong Wang; Jian Yang; Yanqing Hu; Guanhua Chen

arXiv:2605.11685·cs.CL·May 13, 2026

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

Zeguan Xiao, Xuanzhe Xu, Yun Chen, Yong Wang, Jian Yang, Yanqing Hu, Guanhua Chen

PDF

TL;DR

This paper investigates the vulnerability of LLM unlearning to relearning attacks, revealing the importance of minor components in representations and proposing a new method targeting them for improved robustness.

Contribution

The paper uncovers the role of minor components in representation geometry for unlearning robustness and introduces Minor Component Unlearning (MCU), a novel approach targeting these components.

Findings

01

Existing unlearning methods mainly optimize dominant components.

02

Minor components are more resistant to relearning attacks.

03

MCU significantly improves resistance to relearning attacks.

Abstract

Large language model (LLM) unlearning aims to remove specific data influences from pre-trained model without costly retraining, addressing privacy, copyright, and safety concerns. However, recent studies reveal a critical vulnerability: unlearned models rapidly recover "forgotten" knowledge through relearning attacks. This fragility raises serious security concerns, especially for open-weight models. In this work, we investigate the fundamental mechanism underlying this fragility from a representation geometry perspective. We discover that existing unlearning methods predominantly optimize along dominant components, leaving minor components largely unchanged. Critically, during relearning attacks, the modifications in these dominant components are easily reversed, enabling rapid knowledge recovery, whereas minor components exhibit stronger resistance to such reversal. We further provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.