Evaluating the Defense Potential of Machine Unlearning against Membership Inference Attacks
Theodoros Tsiolakis, Vasilis Perifanis, Nikolaos Pavlidis, Christos Chrysanthos Nikolaidis, Aristeidis Sidiropoulos, Pavlos S. Efraimidis

TL;DR
This paper empirically evaluates the potential of machine unlearning algorithms as targeted defenses against membership inference attacks, highlighting their effectiveness and limitations across different models and datasets.
Contribution
It introduces an empirical assessment of three machine unlearning algorithms as privacy defenses against MIAs, analyzing their performance and sensitivity factors.
Findings
MU can mitigate MIAs but depends on algorithm choice and model capacity.
Negative Gradient often degrades membership signals broadly.
SFTC can reinforce membership signals of retained data.
Abstract
Membership Inference Attacks (MIAs) pose a significant privacy risk by enabling adversaries to determine if a specific data point was part of a model's training set. This work empirically investigates whether MU algorithms can function as a targeted, active defense mechanism, in scenarios where a privacy audit identifies specific classes or individuals as highly susceptible to MIAs post-training. By 'dulling' the model's categorical memory of these samples, the process effectively mitigates the membership signal and reduces the MIA success rate for the most vulnerable users. We evaluate the defense potential of three MU algorithms, Negative Gradient (neg grad), SCalable Remembering and Unlearning unBound (SCRUB), and Selective Fine-tuning and Targeted Confusion (SFTC), across four diverse datasets and three complexity-based model groups. Our findings reveal that MU can function as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
