Stronger Data Poisoning Attacks Break Data Sanitization Defenses
Pang Wei Koh, Jacob Steinhardt, Percy Liang

TL;DR
This paper presents three advanced data poisoning attacks that can bypass common data sanitization defenses, significantly increasing model error rates on real datasets, highlighting the need for more robust defenses.
Contribution
The authors develop three novel poisoning attacks that effectively evade existing sanitization defenses by formulating attack strategies as constrained optimization problems.
Findings
Attacks increase test error from 3% to 24% on Enron dataset.
Attacks increase test error from 12% to 29% on IMDB dataset.
Existing defenses are ineffective against these new attack methods.
Abstract
Machine learning models trained on data from the outside world can be corrupted by data poisoning attacks that inject malicious points into the models' training sets. A common defense against these attacks is data sanitization: first filter out anomalous training points before training the model. In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition. By adding just 3% poisoned data, our attacks successfully increase test error on the Enron spam detection dataset from 3% to 24% and on the IMDB sentiment classification dataset from 12% to 29%. In contrast, existing attacks which do not explicitly account for these data sanitization defenses are defeated by them. Our attacks are based on two ideas: (i) we coordinate our attacks to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · COVID-19 diagnosis using AI · Anomaly Detection Techniques and Applications
