Machine Unlearning Fails to Remove Data Poisoning Attacks

Martin Pawelczyk; Jimmy Z. Di; Yiwei Lu; Gautam Kamath; Ayush Sekhari; Seth Neel

arXiv:2406.17216·cs.LG·January 16, 2026

Machine Unlearning Fails to Remove Data Poisoning Attacks

Martin Pawelczyk, Jimmy Z. Di, Yiwei Lu, Gautam Kamath, Ayush Sekhari, Seth Neel

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper critically evaluates existing machine unlearning methods, revealing their inability to effectively remove data poisoning effects across various attacks and models, and highlights the need for more comprehensive evaluation metrics.

Contribution

It introduces new evaluation metrics for unlearning effectiveness against data poisoning and demonstrates the limitations of current unlearning methods in practical scenarios.

Findings

01

Existing unlearning methods fail to remove poisoning effects effectively.

02

Unlearning methods offer limited benefits over retraining.

03

New evaluation metrics reveal unlearning shortcomings.

Abstract

We revisit the efficacy of several practical methods for approximate machine unlearning developed for large-scale deep learning. In addition to complying with data deletion requests, one often-cited potential application for unlearning methods is to remove the effects of poisoned data. We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of settings, they fail to remove the effects of data poisoning across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget. In order to precisely characterize unlearning efficacy, we introduce new evaluation metrics for unlearning based on data poisoning. Our results suggest that a broader perspective, including a wider variety of…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

* Overall I think the exposition of the work was good, and the authors did a good job of surveying the state of the field, and seemed to do a thorough job of experimentation. * The authors make a convincing point about issues with current machine unlearning evaluations, and experimentally show issues with existing methods. * The proposed Gaussian poisoning method is pretty intuitive and well motivated, and seems to do a good job at classifying machine unlearning.

Weaknesses

* It seems like Sommer et al. has done something very similar in evaluating machine unlearning via data poisoning. This might also apply to a lesser extent to Marchant et al. and Goel et al. * The Gaussian poisoning method is intuitive, but it would really have been nice to see some more in depth analysis in a toy setting as to what extent the Gaussian samples are encoded in model weights via gradients. * It's nice to see your method correlates with other poisoning from Geiping et al. Your met

Reviewer 02Rating 6Confidence 4

Strengths

1. The paper is well-written and introduces all necessary preliminary information clearly 2. A wide array of machine unlearning algorithms is studied and compared

Weaknesses

1. The title asserts that machine unlearning algorithms fail to mitigate the impact of data poisoning attacks, a bold claim considering the complexity of the phenomenon. 2. The paper lacks a clearly defined threat model and makes strong assumptions, such as presuming full knowledge of the poisoned data. 3. To robustly evaluate whether unlearning can counteract data poisoning, exploring a wider range of scenarios with varying knowledge levels, additional poisoned models, and diverse poisoni

Reviewer 03Rating 6Confidence 3

Strengths

**Strengths of this Paper** 1. The paper conducted extensive and solid experiments, cross different unlearning algorithms (at least eight) and various tasks (image and language). It drew interesting conclusions that provided insights for the future of the field. 2. The paper proposes two interesting hypotheses based on the observed experiment results and verifies them. 3. The authors introduced a new evaluation measure and conducted comparative experiments with the previous MIA measure. Thes

Weaknesses

1. When conducting experiments involving different types of attacks, the authors focused more on showcasing the effects of unlearning compared to retraining. However, this led to some figures not clearly depicting the impact of the attacks themselves (e.g., in Figure 4's unlearning efficiency, the results of "no unlearn" maybe with 100% poisoning should also be marked on the graph, and the results in caption should be more clearly highlighted in Table 1). 2. Based on my exploration of methods i

Code & Models

Repositories

martinpawel/openunlearn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Cloud Data Security Solutions · Security and Verification in Computing