Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities
Jialin Wu, Kaikai Pan, Yanjiao Chen, Jiangyi Deng, Shengyuan Pang, and, Wenyuan Xu

TL;DR
Protego is a detection framework that leverages the intrinsic attention mechanisms of Vision Transformers to effectively identify adversarial examples, achieving high accuracy and outperforming existing methods.
Contribution
We propose Protego, a novel detection method utilizing Vision Transformer's attention capabilities to identify adversarial examples in computer vision tasks.
Findings
Detector's AUC scores exceed 0.95 for six attack methods
Protego outperforms existing adversarial detection methods
High effectiveness demonstrated in experiments
Abstract
Transformer models have excelled in natural language tasks, prompting the vision community to explore their implementation in computer vision problems. However, these models are still influenced by adversarial examples. In this paper, we investigate the attack capabilities of six common adversarial attacks on three pretrained ViT models to reveal the vulnerability of ViT models. To understand and analyse the bias in neural network decisions when the input is adversarial, we use two visualisation techniques that are attention rollout and grad attention rollout. To prevent ViT models from adversarial attack, we propose Protego, a detection framework that leverages the transformer intrinsic capabilities to detection adversarial examples of ViT models. Nonetheless, this is challenging due to a diversity of attack strategies that may be adopted by adversaries. Inspired by the attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
