Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
Saebom Leem, Hyunseok Seo

TL;DR
This paper introduces an attention-guided visualization method for Vision Transformers that leverages gradients and self-attention scores to produce high-quality, localized, semantic explanations of model decisions using only class labels.
Contribution
It proposes a novel visualization technique for ViT that combines gradients and self-attention scores to improve explainability and localization performance.
Findings
Outperforms previous ViT explainability methods in localization tasks.
Provides high-level semantic explanations with accurate localization.
Demonstrates faithful model explanations through perturbation tests.
Abstract
Vision Transformer(ViT) is one of the most widely used models in the computer vision field with its great performance on various tasks. In order to fully utilize the ViT-based architecture in various applications, proper visualization methods with a decent localization performance are necessary, but these methods employed in CNN-based models are still not available in ViT due to its unique structure. In this work, we propose an attention-guided visualization method applied to ViT that provides a high-level semantic explanation for its decision. Our method selectively aggregates the gradients directly propagated from the classification output to each self-attention, collecting the contribution of image features extracted from each location of the input image. These gradients are additionally guided by the normalized self-attention scores, which are the pairwise patch correlation scores.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Industrial Vision Systems and Defect Detection
