TL;DR
This paper introduces a novel randomized ablation technique to certify and improve classifier robustness against sparse adversarial attacks, extending robustness guarantees to the L_0 threat model.
Contribution
It proposes an efficient, certifiably robust defense using feature ablation, providing tighter robustness certificates than previous additive noise methods.
Findings
Certifies over 50% of MNIST images to be robust to 8-pixel distortions.
Achieves median robustness of 8 pixels on MNIST, outperforming prior noise-based certificates.
Demonstrates high empirical robustness to sparse attacks with only slight accuracy decrease.
Abstract
Recently, techniques have been developed to provably guarantee the robustness of a classifier to adversarial perturbations of bounded L_1 and L_2 magnitudes by using randomized smoothing: the robust classification is a consensus of base classifications on randomly noised samples where the noise is additive. In this paper, we extend this technique to the L_0 threat model. We propose an efficient and certifiably robust defense against sparse adversarial attacks by randomly ablating input features, rather than using additive noise. Experimentally, on MNIST, we can certify the classifications of over 50% of images to be robust to any distortion of at most 8 pixels. This is comparable to the observed empirical robustness of unprotected classifiers on MNIST to modern L_0 attacks, demonstrating the tightness of the proposed robustness certificate. We also evaluate our certificate on ImageNet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
