Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy
Yangsibo Huang, Daogao Liu, Lynn Chua, Badih Ghazi, Pritish Kamath,, Ravi Kumar, Pasin Manurangsi, Milad Nasr, Amer Sinha, Chiyuan Zhang

TL;DR
This paper reveals a vulnerability in machine unlearning systems where adversaries can significantly degrade model accuracy by submitting fake unlearning requests, highlighting the need for more robust verification methods.
Contribution
It introduces attack algorithms that target unlearning methods using data not in the training set and evaluates their effectiveness on image classification tasks.
Findings
White-box attacks reduce accuracy to 3.6% on CIFAR-10
Black-box attacks reduce accuracy to 8.5% on CIFAR-10
Most verification mechanisms fail to detect stealthy attacks
Abstract
Machine unlearning algorithms, designed for selective removal of training data from models, have emerged as a promising approach to growing privacy concerns. In this work, we expose a critical yet underexplored vulnerability in the deployment of unlearning systems: the assumption that the data requested for removal is always part of the original training set. We present a threat model where an attacker can degrade model accuracy by submitting adversarial unlearning requests for data not present in the training set. We propose white-box and black-box attack algorithms and evaluate them through a case study on image classification tasks using the CIFAR-10 and ImageNet datasets, targeting a family of widely used unlearning methods. Our results show extremely poor test accuracy following the attack: 3.6% on CIFAR-10 and 0.4% on ImageNet for white-box attacks, and 8.5% on CIFAR-10 and 1.3%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Fault Detection and Control Systems
