Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning

Brennon Brimhall; Philip Mathew; Neil Fendley; Yinzhi Cao; Matthew Green

arXiv:2505.08138·cs.LG·May 14, 2025

Mirror Mirror on the Wall, Have I Forgotten it All? A New Framework for Evaluating Machine Unlearning

Brennon Brimhall, Philip Mathew, Neil Fendley, Yinzhi Cao, Matthew Green

PDF

3 Reviews

TL;DR

This paper introduces a formal framework called computational unlearning to evaluate machine unlearning methods, demonstrating that many existing approaches can be distinguished from retrained models, thus questioning their effectiveness.

Contribution

It proposes a rigorous definition of computational unlearning, analyzes the limitations of current unlearning methods, and explores the relationship with differential privacy, providing a foundation for future research.

Findings

01

Adversaries can distinguish unlearned models from retrained models using existing metrics.

02

Computational unlearning cannot be achieved by deterministic methods for entropic learning algorithms.

03

Differential privacy-based unlearning approaches require extreme utility trade-offs.

Abstract

Machine unlearning methods take a model trained on a dataset and a forget set, then attempt to produce a model as if it had only been trained on the examples not in the forget set. We empirically show that an adversary is able to distinguish between a mirror model (a control model produced by retraining without the data to forget) and a model produced by an unlearning method across representative unlearning methods from the literature. We build distinguishing algorithms based on evaluation scores in the literature (i.e. membership inference scores) and Kullback-Leibler divergence. We propose a strong formal definition for machine unlearning called computational unlearning. Computational unlearning is defined as the inability for an adversary to distinguish between a mirror model and a model produced by an unlearning method. If the adversary cannot guess better than random (except with…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 2

Strengths

1, The authors present a very interesting perspective on machine unlearning evaluation. 2. It is commendable that the authors incorporate knowledge from other domains into the machine learning context, as mentioned in Line 187.

Weaknesses

1. The presentation of this paper is difficult to follow. For example, it would be clearer to present the content in Figure 1 using an algorithmic block or pseudocode format. 2. The baseline methods selected by the authors are not sufficiently representative. Several well-known unlearning methods, such as SalUN [1] and $\ell_1$-sparse [2], could be included for a more comprehensive comparison. 3. It would be helpful to include a notation table in the Appendix, as many symbols and variables are u

Reviewer 02Rating 2Confidence 3

Strengths

1. The development of rigorous evaluation methodologies for machine unlearning is a critically important research direction. 2. The framework leverages two distinguisher scores, including a Membership Inference Attack (MIA)-based metric and a Kullback-Leibler (KL) divergence-based metric.

Weaknesses

1. The empirical validation seems to be limited to a single model (ResNet-18) and dataset (CIFAR-10), which restricts the generalizability of the claims. 2. The presentation, particularly in the methodology section, lacks clarity and is difficult to follow. 3. Retraining from scratch inherently involves stochasticity. The methodology appears not to explicitly account for the randomness in the training dynamics. 4. The employed Membership Inference Attack can be strengthened. State-of-the-art app

Reviewer 03Rating 2Confidence 3

Strengths

1. Propose a framework based on cryptographic theory for evaluating unlearning methods, which rigorously define the criteria for successful unlearning as computational indistinguishability between the unlearned model and a control model. 2. The paper makes strong theoretical contributions by leveraging a security game framework rooted in cryptography to rigorously define and evaluate machine unlearning. The conclusion that none of the deterministic, heuristic unlearning methods can achieve com

Weaknesses

1. The computational unlearning evaluation based only serves to assess the efficacy of unlearning by measuring posterior difference (MIAScore and KLDScore) from the ideal retrained model, without offering constructive insights or a method for developing better approximate unlearning solutions. 2. The paper lacks a dedicated comparison or discussion of its framework against prior machine unlearning evaluation works that similarly employ game-theoretic or cryptographic-inspired methodologies[1].

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.