(De)Randomized Smoothing for Certifiable Defense against Patch Attacks
Alexander Levine, Soheil Feizi

TL;DR
This paper introduces a new certifiable defense mechanism against patch adversarial attacks on images, providing deterministic robustness guarantees that outperform previous methods in speed and accuracy on CIFAR-10 and ImageNet.
Contribution
The authors develop a de-randomized smoothing-based certification method specifically for patch attacks, achieving faster training and higher certified accuracy than prior approaches.
Findings
Achieves up to 57.6% certified accuracy on CIFAR-10 for 5x5 patches.
Provides certificates at ImageNet scale.
Outperforms existing certification methods in speed and robustness.
Abstract
Patch adversarial attacks on images, in which the attacker can distort pixels within a region of bounded size, are an important threat model since they provide a quantitative model for physical adversarial attacks. In this paper, we introduce a certifiable defense against patch attacks that guarantees for a given image and patch attack size, no patch adversarial examples exist. Our method is related to the broad class of randomized smoothing robustness schemes which provide high-confidence probabilistic robustness certificates. By exploiting the fact that patch attacks are more constrained than general sparse attacks, we derive meaningfully large robustness certificates against them. Additionally, in contrast to smoothing-based defenses against L_p and sparse attacks, our defense method against patch attacks is de-randomized, yielding improved, deterministic certificates. Compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Physical Unclonable Functions (PUFs) and Hardware Security
MethodsRandomized Smoothing
