Certified Patch Robustness via Smoothed Vision Transformers
Hadi Salman, Saachi Jain, Eric Wong, Aleksander M\k{a}dry

TL;DR
This paper introduces a method using vision transformers to improve certified patch robustness in image classifiers, achieving better robustness and efficiency without sacrificing standard accuracy.
Contribution
The paper presents a novel approach leveraging vision transformers for certified patch defenses, enhancing robustness and computational efficiency over existing methods.
Findings
Significantly improved certified patch robustness.
More computationally efficient than previous methods.
Maintains high standard accuracy.
Abstract
Certified patch defenses can guarantee robustness of an image classifier to arbitrary changes within a bounded contiguous region. But, currently, this robustness comes at a cost of degraded standard accuracies and slower inference times. We demonstrate how using vision transformers enables significantly better certified patch robustness that is also more computationally efficient and does not incur a substantial drop in standard accuracy. These improvements stem from the inherent ability of the vision transformer to gracefully handle largely masked images. Our code is available at https://github.com/MadryLab/smoothed-vit.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Cell Image Analysis Techniques
MethodsAttention Is All You Need · Linear Layer · Softmax · Residual Connection · Multi-Head Attention · Layer Normalization · Dense Connections · Vision Transformer
