Label Smoothing Improves Machine Unlearning
Zonglin Di, Zhaowei Zhu, Jinghan Jia, Jiancheng Liu, Zafar Takhirov,, Bo Jiang, Yuanshun Yao, Sijia Liu, Yang Liu

TL;DR
This paper introduces UGradSL, a simple gradient-based machine unlearning method leveraging label smoothing, which significantly improves unlearning accuracy with minimal additional computation, validated across diverse datasets.
Contribution
The paper proposes UGradSL, a novel plug-and-play unlearning approach that uses inverse label smoothing, supported by theoretical analysis and extensive experiments demonstrating its effectiveness.
Findings
UGradSL improves unlearning accuracy by 66% over baseline.
The method maintains unlearning efficiency with marginal computational cost.
Extensive experiments confirm robustness across various datasets.
Abstract
The objective of machine unlearning (MU) is to eliminate previously learned data from a model. However, it is challenging to strike a balance between computation cost and performance when using existing MU techniques. Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of label smoothing. This work introduces UGradSL, a simple, plug-and-play MU approach that uses smoothed labels. We provide theoretical analyses demonstrating why properly introducing label smoothing improves MU performance. We conducted extensive experiments on six datasets of various sizes and different modalities, demonstrating the effectiveness and robustness of our proposed method. The consistent improvement in MU performance is only at a marginal cost of additional computations. For…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper has clear problem framing and the motivation for introducing label smoothing to stablilize gradient ascent is sound. 2. The theoretical analysis part is clear and provides useful insights. 3. The proposed UGradSL is simple yet effective. 4. The experiments cover class-level, random, and group unlearning and sufficient datasets and baselines.
1. The forget-set size: The forget-set size appears fixed for each experiment. Varying forget-set sizes is important for assessing the effectiveness of the method. This would reveal whether the proposed method scales well when more data need to be forgotten. 2. Lack of ablation study/sensitivity analysis: The paper lacks a discussion on the contribution of each term in the mixed gradient objective. For example, results for different p should be provided, since it is an important factor used to b
+ This paper is clear, logical and easy to understand. The tables and figure are clear and detailed, with good instructions. + The proposed method is a simple and plug-and-play tool, which can directly integrate into the existing gradient-based unlearning methods (such as GA and finetune) and improve their performance. + This paper provides the mathematical and theory proof to explain the limitation of existing GA methods and the effectiveness of NLS in specific circumstances. Moreover, the auth
- Although the baseline methods are representative, the experiments lack comparison with the latest schemes between 2024 and 2025. - Some symbols and formulas lack precise definitions and explanations or exist clerical, such as “distance d()” in the Algorithm 1 where different distance calculation methods can lead to the difference between computational overhead and performance. - The performance of the method is sensitive to the hyperparameters settings (such as p and α in Eq.8), which may resu
1. Label smoothing combined with a gradient-based method is interesting for handling unlearning 2. Both theoretical and experimental evidence support the effectiveness of the proposed method.
1. The presentation of the technical sections is not clear, for example, why there exists an $\approx$ symbol in the condition of Theorem 1 and why $\epsilon$ in the conclusion of $\epsilon$-Label-LDP relies on the weights $\gamma_1$ and $\gamma_2$. 2. The proposed method requires calculating the distance with samples in the minibatch, which will lead to a large computation cost. 3. The proposed UGradSL does not work better in performance, while UGradSL+ requires a longer time than other basel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsControl Systems in Engineering
MethodsLabel Smoothing
