Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model   Accuracy

Yangsibo Huang; Daogao Liu; Lynn Chua; Badih Ghazi; Pritish Kamath,; Ravi Kumar; Pasin Manurangsi; Milad Nasr; Amer Sinha; Chiyuan Zhang

arXiv:2410.09591·cs.CR·October 15, 2024

Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy

Yangsibo Huang, Daogao Liu, Lynn Chua, Badih Ghazi, Pritish Kamath,, Ravi Kumar, Pasin Manurangsi, Milad Nasr, Amer Sinha, Chiyuan Zhang

PDF

Open Access

TL;DR

This paper reveals a vulnerability in machine unlearning systems where adversaries can significantly degrade model accuracy by submitting fake unlearning requests, highlighting the need for more robust verification methods.

Contribution

It introduces attack algorithms that target unlearning methods using data not in the training set and evaluates their effectiveness on image classification tasks.

Findings

01

White-box attacks reduce accuracy to 3.6% on CIFAR-10

02

Black-box attacks reduce accuracy to 8.5% on CIFAR-10

03

Most verification mechanisms fail to detect stealthy attacks

Abstract

Machine unlearning algorithms, designed for selective removal of training data from models, have emerged as a promising approach to growing privacy concerns. In this work, we expose a critical yet underexplored vulnerability in the deployment of unlearning systems: the assumption that the data requested for removal is always part of the original training set. We present a threat model where an attacker can degrade model accuracy by submitting adversarial unlearning requests for data not present in the training set. We propose white-box and black-box attack algorithms and evaluate them through a case study on image classification tasks using the CIFAR-10 and ImageNet datasets, targeting a family of widely used unlearning methods. Our results show extremely poor test accuracy following the attack: 3.6% on CIFAR-10 and 0.4% on ImageNet for white-box attacks, and 8.5% on CIFAR-10 and 1.3%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Fault Detection and Control Systems