Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning
Changsheng Wang, Yihua Zhang, Jinghan Jia, Parikshit Ram, Dennis Wei, Yuguang Yao, Soumyadeep Pal, Nathalie Baracaldo, Sijia Liu

TL;DR
This paper introduces invariant LLM unlearning (ILU), a novel regularization framework inspired by invariant risk minimization, which enhances the robustness of unlearning in large language models against unanticipated downstream fine-tuning.
Contribution
The paper proposes ILU, the first invariant risk minimization-based approach for LLM unlearning, improving robustness and generalization to diverse fine-tuning tasks.
Findings
ILU outperforms state-of-the-art unlearning methods on benchmarks.
ILU maintains unlearning effectiveness across various downstream tasks.
ILU preserves fine-tuning performance while enhancing robustness.
Abstract
Machine unlearning offers a promising solution to privacy and safety concerns in large language models (LLMs) by selectively removing targeted knowledge while preserving utility. However, current methods are highly sensitive to downstream fine-tuning, which can quickly recover forgotten information-even from unrelated tasks. To address this, we introduce invariance into unlearning for the first time, inspired by invariant risk minimization (IRM). Building on this principle, we propose invariant LLM unlearning (ILU), a regularization-based framework that enhances robustness. Notably, ILU generalizes well to diverse fine-tuning tasks, even when trained using a single dataset. A task vector analysis is also provided to further elucidate the rationale behind ILU's effectiveness. Extensive experiments on the WMDP and MUSE benchmark, reveal that ILU significantly outperforms state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvancements in Photolithography Techniques · Advanced Surface Polishing Techniques
