Towards Robust Vision Transformer via Masked Adaptive Ensemble

Fudong Lin; Jiadong Lou; Xu Yuan; and Nian-Feng Tzeng

arXiv:2407.15385·cs.CV·July 23, 2024

Towards Robust Vision Transformer via Masked Adaptive Ensemble

Fudong Lin, Jiadong Lou, Xu Yuan, and Nian-Feng Tzeng

PDF

TL;DR

This paper introduces a novel Vision Transformer architecture with an adaptive ensemble and detection mechanism to improve robustness against adversarial attacks while maintaining high standard accuracy.

Contribution

It proposes a new ViT design with a detector and adaptive ensemble, enhancing robustness and accuracy trade-offs, and introduces a patch masking technique for better defense against adaptive attacks.

Findings

01

Achieves 90.3% standard accuracy on CIFAR-10.

02

Attains 49.8% adversarial robustness against attacks.

03

Outperforms existing methods in robustness and accuracy trade-offs.

Abstract

Adversarial training (AT) can help improve the robustness of Vision Transformers (ViT) against adversarial attacks by intentionally injecting adversarial examples into the training data. However, this way of adversarial injection inevitably incurs standard accuracy degradation to some extent, thereby calling for a trade-off between standard accuracy and robustness. Besides, the prominent AT solutions are still vulnerable to adaptive attacks. To tackle such shortcomings, this paper proposes a novel ViT architecture, including a detector and a classifier bridged by our newly developed adaptive ensemble. Specifically, we empirically discover that detecting adversarial examples can benefit from the Guided Backpropagation technique. Driven by this discovery, a novel Multi-head Self-Attention (MSA) mechanism is introduced to enhance our detector to sniff adversarial examples. Then, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.