FaLW: A Forgetting-aware Loss Reweighting for Long-tailed Unlearning

Liheng Yu; Zhe Zhao; Yuxuan Wang; Pengkun Wang; Xiaofeng Cao; Binwu Wang; Yang Wang

arXiv:2601.18650·cs.LG·February 24, 2026

FaLW: A Forgetting-aware Loss Reweighting for Long-tailed Unlearning

Liheng Yu, Zhe Zhao, Yuxuan Wang, Pengkun Wang, Xiaofeng Cao, Binwu Wang, Yang Wang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces FaLW, a novel loss reweighting method for long-tailed data unlearning that effectively addresses challenges posed by imbalanced forget sets, improving unlearning performance.

Contribution

FaLW is the first method to specifically handle long-tailed unlearning scenarios by dynamically reweighting samples based on their unlearning state, improving efficiency and accuracy.

Findings

01

FaLW outperforms existing unlearning methods in long-tailed settings.

02

It effectively mitigates heterogeneous and skewed unlearning deviations.

03

Experimental results show significant performance improvements.

Abstract

Machine unlearning, which aims to efficiently remove the influence of specific data from trained models, is crucial for upholding data privacy regulations like the ``right to be forgotten". However, existing research predominantly evaluates unlearning methods on relatively balanced forget sets. This overlooks a common real-world scenario where data to be forgotten, such as a user's activity records, follows a long-tailed distribution. Our work is the first to investigate this critical research gap. We find that in such long-tailed settings, existing methods suffer from two key issues: \textit{Heterogeneous Unlearning Deviation} and \textit{Skewed Unlearning Deviation}. To address these challenges, we propose FaLW, a plug-and-play, instance-wise dynamic loss reweighting method. FaLW innovatively assesses the unlearning state of each sample by comparing its predictive probability to the…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The paper studies a novel problem arising in the context of unlearning and proposes a novel solution to address this. The problem addressed is quite relevant and practical. The paper demonstrates empirically the unlearning deviation problem under long tailed distribution setups, and defines the problem clearly, proposing two kinds of unlearning deviation. The proposed FaLW is sound, addressing the identified problem to the extent possible. Srong empirical results on several real worl

Weaknesses

The methodology addresses the problem to a good extend but suffers from some drawbacks 1. The requirement to have unseen data points from the same class might be impractical.- in practice such auxulliary data may not be available 2. FaLW does not provide a formal guarantee or certification that the influence of the forget set is removed 3. The definition of unlearning deviation in the paper involves a threshold $\tau_i$, but in the proposed weighing scheme the paper seems to have ignored this

Reviewer 02Rating 6Confidence 2

Strengths

1. The paper highlights an under-explored but practically important phenomenon in machine unlearning that the forgotten data often follows a long-tailed distribution. The problem is important and the motivation of the work is clear. 2. The formulation of Heterogeneous Unlearning Deviation (HUD) and Skewed Unlearning Deviation (SUD) provides a structured way to analyze performance degradation in unlearning systems, which offers a useful framing for future work. 3. The proposed FaLW is simple b

Weaknesses

1. Lack of empirical validation for plug-and-play claim: Although the proposed FaLW (Forgetting-Aware Loss Reweighting) is described as a plug-and-play solution, the paper only evaluates FaLW as a standalone framework. There are no experiments demonstrating its integration into other existing unlearning methods. 2. Limited analysis of the identified issues HUD and SUD: The paper identified two important issues: Heterogeneous Unlearning Deviation (HUD) and Skewed Unlearning Deviation (SUD) as ke

Reviewer 03Rating 6Confidence 4

Strengths

1. The paper is, to the best of my knowledge, the first to explicitly formulate long-tailed forget sets (not long-tailed training data) and to show that existing approximate unlearning methods exhibit heterogeneous and skewed unlearning deviations under this realistic setting. This is an underexplored but practical scenario. 2. The proposed FaLW is conceptually simple, instance-wise, and orthogonal to most gradient-based unlearning pipelines. It can be adopted with minor code changes. 3. The dir

Weaknesses

1. Limited theoretical justification – while the adaptive weighting function is motivated by uncertainty, the derivation remains heuristic. The paper lacks formal analysis or convergence guarantees explaining why the proposed weighting yields more reliable unlearning. 2. Ablation insufficiency – although the paper reports a few ablations, it does not disentangle the specific contributions of the uncertainty term versus the similarity term in the weighting function. 3. Lack of comparison with rec

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Domain Adaptation and Few-Shot Learning