Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching
Jonas Geiping, Liam Fowl, W. Ronny Huang, Wojciech Czaja, Gavin, Taylor, Michael Moeller, Tom Goldstein

TL;DR
This paper introduces a scalable, targeted data poisoning attack using gradient matching that can cause misclassification in large, modern deep neural networks trained from scratch, highlighting a significant security threat.
Contribution
It presents the first effective large-scale poisoning method that works on full-sized datasets like ImageNet, demonstrating vulnerabilities in current defenses.
Findings
First successful targeted poisoning attack on ImageNet from scratch
Attack remains nearly imperceptible and effective against modern models
Existing defenses are insufficient against this threat
Abstract
Data Poisoning attacks modify training data to maliciously control a model trained on such data. In this work, we focus on targeted poisoning attacks which cause a reclassification of an unmodified test image and as such breach model integrity. We consider a particularly malicious poisoning attack that is both "from scratch" and "clean label", meaning we analyze an attack that successfully works against new, randomly initialized models, and is nearly imperceptible to humans, all while perturbing only a small fraction of the training data. Previous poisoning attacks against deep neural networks in this setting have been limited in scope and success, working only in simplified settings or being prohibitively expensive for large datasets. The central mechanism of the new attack is matching the gradient direction of malicious examples. We analyze why this works, supplement with practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
