Memory Backdoor Attacks on Neural Networks
Eden Luzon, Guy Amit, Roy Weiss, Torsten Kraub, Alexandra Dmitrienko, Yisroel Mirsky

TL;DR
This paper introduces a novel backdoor attack in federated learning that enables a malicious server to systematically and exactly extract private training data from client models with minimal impact on model utility.
Contribution
The authors present a new training time backdoor attack that guarantees precise data recovery and high robustness, exposing a critical privacy vulnerability in federated learning systems.
Findings
Thousands of sensitive samples can be recovered from client models.
Complete datasets can be stolen after multiple federated learning rounds.
Minimal utility drop (e.g., 3%) in models during data extraction.
Abstract
Neural networks are often trained on proprietary datasets, making them attractive attack targets. We present a novel dataset extraction method leveraging an innovative training time backdoor attack, allowing a malicious federated learning server to systematically and deterministically extract complete client training samples through a simple indexing process. Unlike prior techniques, our approach guarantees exact data recovery rather than probabilistic reconstructions or hallucinations, provides precise control over which samples are memorized and how many, and shows high capacity and robustness. Infected models output data samples when they receive a patternbased index trigger, enabling systematic extraction of meaningful patches from each clients local data without disrupting global model utility. To address small model output sizes, we extract patches and then recombined them. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
