Certified Defenses for Data Poisoning Attacks
Jacob Steinhardt, Pang Wei Koh, Percy Liang

TL;DR
This paper develops theoretical bounds on the worst-case loss of data poisoning defenses, providing a way to evaluate their robustness, and demonstrates varying resilience across different datasets.
Contribution
It introduces approximate upper bounds on loss for defenses using outlier removal and empirical risk minimization, under specific assumptions, and validates these bounds empirically.
Findings
MNIST-1-7 and Dogfish datasets are resilient to poisoning attacks.
IMDB dataset's test error increases from 12% to 23% with 3% poisoned data.
The bounds often closely match the performance of candidate attacks.
Abstract
Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. While recent work has proposed a number of attacks and defenses, little is understood about the worst-case loss of a defense in the face of a determined attacker. We address this by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization. Our approximation relies on two assumptions: (1) that the dataset is large enough for statistical concentration between train and test error to hold, and (2) that outliers within the clean (non-poisoned) data do not have a strong effect on the model. Our bound comes paired with a candidate attack that often nearly matches the upper bound, giving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection
