ViT-ReciproCAM: Gradient and Attention-Free Visual Explanations for Vision Transformer
Seok-Yong Byun, Wonju Lee

TL;DR
This paper introduces ViT-ReciproCAM, a gradient-free, attention-free visual explanation method for Vision Transformers that improves interpretability and debugging capabilities without relying on gradients or attention matrices.
Contribution
It proposes a novel explanation technique for ViT that outperforms existing relevance methods by using token masking and layer output reconstruction, without gradients or attention matrices.
Findings
Outperforms state-of-the-art relevance methods in ADCC metric by 4.58% to 5.80%.
Generates more localized and accurate saliency maps.
Provides an efficient, easy-to-implement explanation method for ViT models.
Abstract
This paper presents a novel approach to address the challenges of understanding the prediction process and debugging prediction errors in Vision Transformers (ViT), which have demonstrated superior performance in various computer vision tasks such as image classification and object detection. While several visual explainability techniques, such as CAM, Grad-CAM, Score-CAM, and Recipro-CAM, have been extensively researched for Convolutional Neural Networks (CNNs), limited research has been conducted on ViT. Current state-of-the-art solutions for ViT rely on class agnostic Attention-Rollout and Relevance techniques. In this work, we propose a new gradient-free visual explanation method for ViT, called ViT-ReciproCAM, which does not require attention matrix and gradient information. ViT-ReciproCAM utilizes token masking and generated new layer outputs from the target layer's input to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications
MethodsClass-activation map
