ViT-ReciproCAM: Gradient and Attention-Free Visual Explanations for   Vision Transformer

Seok-Yong Byun; Wonju Lee

arXiv:2310.02588·cs.CV·October 5, 2023·1 cites

ViT-ReciproCAM: Gradient and Attention-Free Visual Explanations for Vision Transformer

Seok-Yong Byun, Wonju Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces ViT-ReciproCAM, a gradient-free, attention-free visual explanation method for Vision Transformers that improves interpretability and debugging capabilities without relying on gradients or attention matrices.

Contribution

It proposes a novel explanation technique for ViT that outperforms existing relevance methods by using token masking and layer output reconstruction, without gradients or attention matrices.

Findings

01

Outperforms state-of-the-art relevance methods in ADCC metric by 4.58% to 5.80%.

02

Generates more localized and accurate saliency maps.

03

Provides an efficient, easy-to-implement explanation method for ViT models.

Abstract

This paper presents a novel approach to address the challenges of understanding the prediction process and debugging prediction errors in Vision Transformers (ViT), which have demonstrated superior performance in various computer vision tasks such as image classification and object detection. While several visual explainability techniques, such as CAM, Grad-CAM, Score-CAM, and Recipro-CAM, have been extensively researched for Convolutional Neural Networks (CNNs), limited research has been conducted on ViT. Current state-of-the-art solutions for ViT rely on class agnostic Attention-Rollout and Relevance techniques. In this work, we propose a new gradient-free visual explanation method for ViT, called ViT-ReciproCAM, which does not require attention matrix and gradient information. ViT-ReciproCAM utilizes token masking and generated new layer outputs from the target layer's input to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

openvinotoolkit/openvino_xai
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications

MethodsClass-activation map