TL;DR
This paper reveals a new data poisoning attack that significantly reduces the certified robustness of models trained with randomized smoothing, even when using state-of-the-art defenses.
Contribution
It introduces a bilevel optimization-based poisoning method that degrades certified robustness, highlighting the importance of data quality in robust machine learning.
Findings
Reduces average certified radius by over 30% on MNIST and CIFAR10.
Effective against models trained with advanced robust training techniques.
Poisoned data transfers across different models and training methods.
Abstract
Predictions of certifiably robust classifiers remain constant in a neighborhood of a point, making them resilient to test-time attacks with a guarantee. In this work, we present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality in achieving high certified adversarial robustness. Specifically, we propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Unlike other poisoning attacks that reduce the accuracy of the poisoned models on a small set of target points, our attack reduces the average certified radius (ACR) of an entire target class in the dataset. Moreover, our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods such as Gaussian data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
