FaLW: A Forgetting-aware Loss Reweighting for Long-tailed Unlearning
Liheng Yu, Zhe Zhao, Yuxuan Wang, Pengkun Wang, Xiaofeng Cao, Binwu Wang, Yang Wang

TL;DR
This paper introduces FaLW, a novel loss reweighting method for long-tailed data unlearning that effectively addresses challenges posed by imbalanced forget sets, improving unlearning performance.
Contribution
FaLW is the first method to specifically handle long-tailed unlearning scenarios by dynamically reweighting samples based on their unlearning state, improving efficiency and accuracy.
Findings
FaLW outperforms existing unlearning methods in long-tailed settings.
It effectively mitigates heterogeneous and skewed unlearning deviations.
Experimental results show significant performance improvements.
Abstract
Machine unlearning, which aims to efficiently remove the influence of specific data from trained models, is crucial for upholding data privacy regulations like the ``right to be forgotten". However, existing research predominantly evaluates unlearning methods on relatively balanced forget sets. This overlooks a common real-world scenario where data to be forgotten, such as a user's activity records, follows a long-tailed distribution. Our work is the first to investigate this critical research gap. We find that in such long-tailed settings, existing methods suffer from two key issues: \textit{Heterogeneous Unlearning Deviation} and \textit{Skewed Unlearning Deviation}. To address these challenges, we propose FaLW, a plug-and-play, instance-wise dynamic loss reweighting method. FaLW innovatively assesses the unlearning state of each sample by comparing its predictive probability to the…
Peer Reviews
Decision·ICLR 2026 Poster
The paper studies a novel problem arising in the context of unlearning and proposes a novel solution to address this. The problem addressed is quite relevant and practical. The paper demonstrates empirically the unlearning deviation problem under long tailed distribution setups, and defines the problem clearly, proposing two kinds of unlearning deviation. The proposed FaLW is sound, addressing the identified problem to the extent possible. Srong empirical results on several real worl
The methodology addresses the problem to a good extend but suffers from some drawbacks 1. The requirement to have unseen data points from the same class might be impractical.- in practice such auxulliary data may not be available 2. FaLW does not provide a formal guarantee or certification that the influence of the forget set is removed 3. The definition of unlearning deviation in the paper involves a threshold $\tau_i$, but in the proposed weighing scheme the paper seems to have ignored this
1. The paper highlights an under-explored but practically important phenomenon in machine unlearning that the forgotten data often follows a long-tailed distribution. The problem is important and the motivation of the work is clear. 2. The formulation of Heterogeneous Unlearning Deviation (HUD) and Skewed Unlearning Deviation (SUD) provides a structured way to analyze performance degradation in unlearning systems, which offers a useful framing for future work. 3. The proposed FaLW is simple b
1. Lack of empirical validation for plug-and-play claim: Although the proposed FaLW (Forgetting-Aware Loss Reweighting) is described as a plug-and-play solution, the paper only evaluates FaLW as a standalone framework. There are no experiments demonstrating its integration into other existing unlearning methods. 2. Limited analysis of the identified issues HUD and SUD: The paper identified two important issues: Heterogeneous Unlearning Deviation (HUD) and Skewed Unlearning Deviation (SUD) as ke
1. The paper is, to the best of my knowledge, the first to explicitly formulate long-tailed forget sets (not long-tailed training data) and to show that existing approximate unlearning methods exhibit heterogeneous and skewed unlearning deviations under this realistic setting. This is an underexplored but practical scenario. 2. The proposed FaLW is conceptually simple, instance-wise, and orthogonal to most gradient-based unlearning pipelines. It can be adopted with minor code changes. 3. The dir
1. Limited theoretical justification – while the adaptive weighting function is motivated by uncertainty, the derivation remains heuristic. The paper lacks formal analysis or convergence guarantees explaining why the proposed weighting yields more reliable unlearning. 2. Ablation insufficiency – although the paper reports a few ablations, it does not disentangle the specific contributions of the uncertainty term versus the similarity term in the weighting function. 3. Lack of comparison with rec
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Domain Adaptation and Few-Shot Learning
