IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris,, Aude Oliva

TL;DR
The paper introduces IA-RED$^2$, an interpretability-aware framework that reduces redundancy in vision transformers, significantly speeding up models with minimal accuracy loss and providing visual interpretability.
Contribution
It proposes a novel interpretability-aware redundancy reduction method for vision transformers, enabling dynamic patch dropping and hierarchical structure extension for efficiency and interpretability.
Findings
Up to 1.4x speed-up on state-of-the-art models
Less than 0.7% accuracy loss
Enhanced interpretability with visual evidence
Abstract
The self-attention-based model, transformer, is recently becoming the leading backbone in the field of computer vision. In spite of the impressive success made by transformers in a variety of vision tasks, it still suffers from heavy computation and intensive memory costs. To address this limitation, this paper presents an Interpretability-Aware REDundancy REDuction framework (IA-RED). We start by observing a large amount of redundant computation, mainly spent on uncorrelated input patches, and then introduce an interpretable module to dynamically and gracefully drop these redundant patches. This novel framework is then extended to a hierarchical structure, where uncorrelated tokens at different stages are gradually removed, resulting in a considerable shrinkage of computational cost. We include extensive experiments on both image and video tasks, where our method could deliver up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · TimeSformer · Residual Connection · Layer Normalization · Attention Dropout · Softmax · Dense Connections · Feedforward Network
