Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem
Zeguan Xiao, Siqing Li, Yong Wang, Xuetao Wei, Jian Yang, Yun Chen, Guanhua Chen

TL;DR
This paper introduces a novel gradient synthesis framework for LLM unlearning, framing it as an asymmetric two-task problem, and demonstrates its effectiveness in improving knowledge removal while preserving model performance.
Contribution
It proposes a retention-prioritized gradient synthesis method, SAGO, that better aligns gradients for unlearning, outperforming existing approaches on benchmark datasets.
Findings
SAGO achieves tighter gradient alignment than PCGrad.
On benchmarks, SAGO significantly improves target knowledge removal.
Re-shaping gradient geometry is key to better unlearning-retention balance.
Abstract
Machine unlearning for large language models (LLMs) aims to remove targeted knowledge while preserving general capability. In this paper, we recast LLM unlearning as an asymmetric two-task problem: retention is the primary objective and forgetting is an auxiliary. From this perspective, we propose a retention-prioritized gradient synthesis framework that decouples task-specific gradient extraction from conflict-aware combination. Instantiating the framework, we adapt established PCGrad to resolve gradient conflicts, and introduce SAGO, a novel retention-prioritized gradient synthesis method. Theoretically, both variants ensure non-negative cosine similarity with the retain gradient, while SAGO achieves strictly tighter alignment through constructive sign-constrained synthesis. Empirically, on WMDP Bio/Cyber and RWKU benchmarks, SAGO consistently pushes the Pareto frontier: e.g., on WMDP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
