Generic Attention-model Explainability by Weighted Relevance   Accumulation

Yiming Huang; Aozhe Jia; Xiaodan Zhang; Jiawei Zhang

arXiv:2308.10240·cs.CV·August 22, 2023

Generic Attention-model Explainability by Weighted Relevance Accumulation

Yiming Huang, Aozhe Jia, Xiaodan Zhang, Jiawei Zhang

PDF

Open Access

TL;DR

This paper introduces a weighted relevance accumulation method for attention explainability in transformer models, improving interpretability by considering token importance, validated on vision-language tasks with a new CLIP-based model.

Contribution

It proposes a novel weighted relevancy strategy for attention explainability, addressing limitations of equal relevance accumulation, and introduces CLIPmapper for evaluation.

Findings

01

Outperforms existing explainability methods in visual question answering.

02

Effective in reducing relevance distortion during explanation.

03

Validated through extensive perturbation tests.

Abstract

Attention-based transformer models have achieved remarkable progress in multi-modal tasks, such as visual question answering. The explainability of attention-based methods has recently attracted wide interest as it can explain the inner changes of attention tokens by accumulating relevancy across attention layers. Current methods simply update relevancy by equally accumulating the token relevancy before and after the attention processes. However, the importance of token values is usually different during relevance accumulation. In this paper, we propose a weighted relevancy strategy, which takes the importance of token values into consideration, to reduce distortion when equally accumulating relevance. To evaluate our method, we propose a unified CLIP-based two-stage model, named CLIPmapper, to process Vision-and-Language tasks through CLIP encoder and a following mapper. CLIPmapper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training