MetaPoison: Practical General-purpose Clean-label Data Poisoning
W. Ronny Huang, Jonas Geiping, Liam Fowl, Gavin Taylor, Tom Goldstein

TL;DR
MetaPoison introduces a practical, general-purpose method for clean-label data poisoning that effectively fools neural networks across various models and training scenarios, including real-world API systems.
Contribution
It presents MetaPoison, a novel first-order meta-learning approach that approximates bilevel optimization for effective, transferable, and versatile data poisoning attacks on deep neural networks.
Findings
MetaPoison outperforms previous clean-label poisoning methods.
Poisoned data transfer across different models and training settings.
Successful real-world poisoning of Google Cloud AutoML models.
Abstract
Data poisoning -- the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data -- is an emerging threat in the context of neural networks. Existing attacks for data poisoning neural networks have relied on hand-crafted heuristics, because solving the poisoning problem directly via bilevel optimization is generally thought of as intractable for deep models. We propose MetaPoison, a first-order method that approximates the bilevel problem via meta-learning and crafts poisons that fool neural networks. MetaPoison is effective: it outperforms previous clean-label poisoning methods by a large margin. MetaPoison is robust: poisoned data made for one model transfer to a variety of victim models with unknown training settings and architectures. MetaPoison is general-purpose, it works not only in fine-tuning scenarios, but also for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
