Why Some Models Resist Unlearning: A Linear Stability Perspective

Wei-Kai Chang; Rajiv Khanna

arXiv:2602.02986·cs.LG·February 4, 2026

Why Some Models Resist Unlearning: A Linear Stability Perspective

Wei-Kai Chang, Rajiv Khanna

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a linear stability framework to understand why certain models resist unlearning, linking data geometry, memorization, and optimization dynamics through theoretical analysis and empirical validation.

Contribution

It provides the first theoretical analysis of unlearning stability using asymptotic linear stability and data coherence, connecting memorization levels to unlearning difficulty.

Findings

01

Models with higher memorization resist unlearning due to increased data coherence.

02

We identify stability thresholds that distinguish between successful and failed unlearning.

03

Empirical tests with Hessian and heatmaps validate the theoretical stability boundaries.

Abstract

Machine unlearning, the ability to erase the effect of specific training samples without retraining from scratch, is critical for privacy, regulation, and efficiency. However, most progress in unlearning has been empirical, with little theoretical understanding of when and why unlearning works. We tackle this gap by framing unlearning through the lens of asymptotic linear stability to capture the interaction between optimization dynamics and data geometry. The key quantity in our analysis is data coherence which is the cross sample alignment of loss surface directions near the optimum. We decompose coherence along three axes: within the retain set, within the forget set, and between them, and prove tight stability thresholds that separate convergence from divergence. To further link data properties to forgettability, we study a two layer ReLU CNN under a signal plus noise model and show…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

I think the perspective provided in this work seems very interesting and it would lead to a prescriptive theory related to unlearning using gradient ascent and descent. I think the unlearning coherence metric that explains when ascent on forget is neutralized by descent on retain seems an actionable metric.

Weaknesses

I am not fully convinced with the definition of unlearning proposed in the paper. Consider a scenario where all per-example gradients (and effectively the per-sample curvatures) are aligned; adding or removing a sample doesn’t change the optimization path or the final solution, so by a deletion standard the model is already “unlearned” without any unlearning step, yet the paper’s coherence lens would label this case as resistant (high coherence ⇒ no escape), which contradicts the intended notion

Reviewer 02Rating 4Confidence 4

Strengths

1. The paper's primary strength is its novel application of linear stability analysis to machine unlearning. Framing unlearning as a problem of escaping a local minimum, rather than standard training, is a powerful conceptual shift. This provides a principled lens to analyze the underlying dynamics, moving the field away from purely empirical observations. 2. The introduction of "data coherence" as a measure of Hessian alignment is a key contribution. It provides an intuitive and quantifiable w

Weaknesses

1. The Linear Approximation is a Strong Limitation: The entire framework is built upon a local linear approximation of the loss landscape around a minimum (w*). This assumption is fragile in deep learning. The unlearning process, especially when successful (divergent), inherently moves parameters far from this local region, invalidating the approximation. Furthermore, it fails to account for the flat, wide minima common in modern networks where the Hessian may be ill-conditioned. 2. Definition o

Reviewer 03Rating 4Confidence 2

Strengths

Strong points of the paper include the soundness of the framework, innovation towards utilizing linear stability to enhance ML unlearning using forget and retain sets, and relatively easy-to-follow organization.

Weaknesses

The main pitfall of the papers lies in the minimum space allocated to discussing the experiment which is a large portion of the discussed findings for this paper, leading to confusion and currently inconclusive results (as they appear in paper). There is not enough discussion for the results of the experiment, and the graphs are hard to read when considering what they are trying to convey. Going more in depth to describe them would be helpful for readers comprehension. While the main purpose

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications