Certified Patch Robustness via Smoothed Vision Transformers

Hadi Salman; Saachi Jain; Eric Wong; Aleksander M\k{a}dry

arXiv:2110.07719·cs.CV·October 18, 2021·1 cites

Certified Patch Robustness via Smoothed Vision Transformers

Hadi Salman, Saachi Jain, Eric Wong, Aleksander M\k{a}dry

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method using vision transformers to improve certified patch robustness in image classifiers, achieving better robustness and efficiency without sacrificing standard accuracy.

Contribution

The paper presents a novel approach leveraging vision transformers for certified patch defenses, enhancing robustness and computational efficiency over existing methods.

Findings

01

Significantly improved certified patch robustness.

02

More computationally efficient than previous methods.

03

Maintains high standard accuracy.

Abstract

Certified patch defenses can guarantee robustness of an image classifier to arbitrary changes within a bounded contiguous region. But, currently, this robustness comes at a cost of degraded standard accuracies and slower inference times. We demonstrate how using vision transformers enables significantly better certified patch robustness that is also more computationally efficient and does not incur a substantial drop in standard accuracy. These improvements stem from the inherent ability of the vision transformer to gracefully handle largely masked images. Our code is available at https://github.com/MadryLab/smoothed-vit.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

madrylab/smoothed-vit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Cell Image Analysis Techniques

MethodsAttention Is All You Need · Linear Layer · Softmax · Residual Connection · Multi-Head Attention · Layer Normalization · Dense Connections · Vision Transformer