Unlearning's Blind Spots: Over-Unlearning and Prototypical Relearning Attack
SeungBum Ha, Saerom Park, Sung Whan Yoon

TL;DR
This paper identifies critical blind spots in machine unlearning, specifically over-unlearning and relearning attacks, and proposes Spotter, a method to mitigate these issues, demonstrating improved robustness on multiple datasets.
Contribution
The paper introduces OU@epsilon to measure over-unlearning, exposes the Prototypical Relearning Attack, and proposes Spotter to counter both issues effectively.
Findings
Spotter reduces over-unlearning damage near forget sets.
Spotter neutralizes Prototypical Relearning Attacks with embedding dispersion.
State-of-the-art results achieved on CIFAR, TinyImageNet, and CASIA-WebFace.
Abstract
Machine unlearning (MU) aims to expunge a designated forget set from a trained model without costly retraining, yet the existing techniques overlook two critical blind spots: "over-unlearning" that deteriorates retained data near the forget set, and post-hoc "relearning" attacks that aim to resurrect the forgotten knowledge. Focusing on class-level unlearning, we first derive an over-unlearning metric, OU@epsilon, which quantifies collateral damage in regions proximal to the forget set, where over-unlearning mainly appears. Next, we expose an unforeseen relearning threat on MU, i.e., the Prototypical Relearning Attack, which exploits the per-class prototype of the forget class with just a few samples, and easily restores the pre-unlearning performance. To counter both blind spots in class-level unlearning, we introduce Spotter, a plug-and-play objective that combines (i) a masked…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCommunication in Education and Healthcare
