IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision   Transformers

Bowen Pan; Rameswar Panda; Yifan Jiang; Zhangyang Wang; Rogerio Feris,; Aude Oliva

arXiv:2106.12620·cs.CV·October 28, 2021·68 cites

IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers

Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris,, Aude Oliva

PDF

Open Access 1 Video

TL;DR

The paper introduces IA-RED$^2$, an interpretability-aware framework that reduces redundancy in vision transformers, significantly speeding up models with minimal accuracy loss and providing visual interpretability.

Contribution

It proposes a novel interpretability-aware redundancy reduction method for vision transformers, enabling dynamic patch dropping and hierarchical structure extension for efficiency and interpretability.

Findings

01

Up to 1.4x speed-up on state-of-the-art models

02

Less than 0.7% accuracy loss

03

Enhanced interpretability with visual evidence

Abstract

The self-attention-based model, transformer, is recently becoming the leading backbone in the field of computer vision. In spite of the impressive success made by transformers in a variety of vision tasks, it still suffers from heavy computation and intensive memory costs. To address this limitation, this paper presents an Interpretability-Aware REDundancy REDuction framework (IA-RED $^{2}$ ). We start by observing a large amount of redundant computation, mainly spent on uncorrelated input patches, and then introduce an interpretable module to dynamically and gracefully drop these redundant patches. This novel framework is then extended to a hierarchical structure, where uncorrelated tokens at different stages are gradually removed, resulting in a considerable shrinkage of computational cost. We include extensive experiments on both image and video tasks, where our method could deliver up…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · TimeSformer · Residual Connection · Layer Normalization · Attention Dropout · Softmax · Dense Connections · Feedforward Network