How Does Attention Work in Vision Transformers? A Visual Analytics   Attempt

Yiran Li; Junpeng Wang; Xin Dai; Liang Wang; Chin-Chia Michael Yeh,; Yan Zheng; Wei Zhang; Kwan-Liu Ma

arXiv:2303.13731·cs.LG·March 27, 2023·1 cites

How Does Attention Work in Vision Transformers? A Visual Analytics Attempt

Yiran Li, Junpeng Wang, Xin Dai, Liang Wang, Chin-Chia Michael Yeh,, Yan Zheng, Wei Zhang, Kwan-Liu Ma

PDF

Open Access

TL;DR

This paper employs visual analytics to interpret vision transformers by analyzing head importance, spatial attention distribution, and learned patterns, thereby deepening understanding of their inner workings.

Contribution

It introduces a comprehensive visual analytics framework to interpret ViT attention mechanisms, including head importance metrics, spatial attention profiling, and pattern summarization.

Findings

01

Identifies important attention heads using pruning metrics

02

Profiles spatial attention distributions within heads

03

Summarizes learned attention patterns with autoencoders

Abstract

Vision transformer (ViT) expands the success of transformer models from sequential data to images. The model decomposes an image into many smaller patches and arranges them into a sequence. Multi-head self-attentions are then applied to the sequence to learn the attention between patches. Despite many successful interpretations of transformers on sequential data, little effort has been devoted to the interpretation of ViTs, and many questions remain unanswered. For example, among the numerous attention heads, which one is more important? How strong are individual patches attending to their spatial neighbors in different heads? What attention patterns have individual heads learned? In this work, we answer these questions through a visual analytics approach. Specifically, we first identify what heads are more important in ViTs by introducing multiple pruning-based metrics. Then, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Visual perception and processing mechanisms

MethodsVisual Analytics