TL;DR
This paper introduces Smoothed Gradient Ascent (SGA), a novel method for LLM unlearning that stabilizes gradient updates by blending forget and normal data, leading to improved performance and utility preservation.
Contribution
The paper proposes SGA, a new unlearning technique that enhances stability and effectiveness of gradient ascent by smoothing updates with normal data, supported by theoretical and empirical validation.
Findings
SGA outperforms standard GA across all benchmarks.
SGA achieves top-2 results among baseline methods.
Theoretical guidance improves smoothing rate selection.
Abstract
LLM unlearning has emerged as a promising approach, aiming to enable models to forget hazardous/undesired knowledge at low cost while preserving as much model utility as possible. Among existing techniques, the most straightforward method is performing Gradient Ascent (GA) w.r.t. the forget data, thereby forcing the model to unlearn the forget dataset. However, GA suffers from severe instability, as it drives updates in a divergent direction, often resulting in drastically degraded model utility. To address this issue, we propose Smoothed Gradient Ascent (SGA). SGA combines the forget data with multiple constructed normal data through a tunable smoothing rate. Intuitively, this extends GA from learning solely on the forget data to jointly learning across both forget and normal data, enabling more stable unlearning while better preserving model utility. Theoretically, we provide the…
Peer Reviews
Decision·Submitted to ICLR 2026
- Paper is very well written. Especially, the authors provide proper backgrounds for LLM unlearning and also analyse the problem of existing methods well. This helps understanding the motivation of proposed method. - Although it might be very simple, authors not just justify their arguments by providing empirical results but also by providing some theoretical analysis.
- The authors claim that identifying suitable retain set is not feasible when they mention the limitations of the existing methods. I wonder how can we confirm that the generated normal dataset would be the 'suitable' retain set? Especially when we are just relying on the other LLMs which can not be assured that they are properly acting. - Looks like (3) and (1) is basically equivalent, but just different realisation of given objective. If so, will they yield same results if 'retain set' is pr
- Overall, SGA is novel method that addresses GA’s instability in LLM unlearning. It tackles a key problem in LLM unlearning, improving forgetting-retention balance. The theoretical analysis of the optimal smoothing rate (r*) provides a useful mathematical framework. - The paper is well-written and structured, with clear explanations and figures. - Experiments are thorough across three diverse benchmarks with strong baselines, appropriate metrics, and ablation studies that support the method’s e
- The authors identified the optimal smoothing rate (r*) as dynamic, but in practice it’s fixed during training. Furthermore, the results also show that the r* varies quite a bit across different smoothing rates and models, and some values even cause training to collapse. Have the authors explored methods that computes r* dynamically/periodically during training? Could it improve results? - SGA relies quite a bit on normal data, which is generated either via embedding similarity or external mod
The idea is clear and makes sense if one only wants to unlearn some data from the trained model. The results are convincing as far as I can tell, especially the selection of models is sufficiently large and diverse. There is some originality in combining label smoothing, normal data generation, and gradient ascent.
Overall, the quality of this work should be enhanced. 1. Retaining is too weak. Machine unlearning is actually a multi-task problem: it's not only about unlearning, but also about retaining the utility. However, GA (and this SGA) is not good at retaining at all. SGA Table 1 has almost zero FQ that is not remotely comparable to retained model for llama2 and phi, which has 1.0 FQ. Also in Table 3, KnowMem on Dr (↑) i.e. utility on retaining data is 1.94 whereas retained model is 55, Kl is 48.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
