Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient   Vision Transformers

Cong Wei; Brendan Duke; Ruowei Jiang; Parham Aarabi and; Graham W. Taylor; Florian Shkurti

arXiv:2303.13755·cs.CV·March 27, 2023·1 cites

Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

Cong Wei, Brendan Duke, Ruowei Jiang, Parham Aarabi and, Graham W. Taylor, Florian Shkurti

PDF

Open Access 1 Repo

TL;DR

Sparsifiner introduces a novel method for learning instance-dependent, sparse attention patterns in Vision Transformers, significantly reducing computational costs while maintaining high accuracy by leveraging a lightweight connectivity predictor.

Contribution

The paper proposes a new approach to learn unstructured, instance-dependent attention masks in ViT, enabling efficient sparse attention with minimal accuracy loss.

Findings

01

Reduces 48% to 69% FLOPs with less than 0.4% accuracy drop.

02

Achieves over 60% FLOPs reduction by combining attention and token sparsity.

03

Outperforms fixed-pattern sparsity methods in Pareto efficiency.

Abstract

Vision Transformers (ViT) have shown their competitive advantages performance-wise compared to convolutional neural networks (CNNs) though they often come with high computational costs. To this end, previous methods explore different attention patterns by limiting a fixed number of spatially nearby tokens to accelerate the ViT's multi-head self-attention (MHSA) operations. However, such structured attention patterns limit the token-to-token connections to their spatial relevance, which disregards learned semantic connections from a full attention mask. In this work, we propose a novel approach to learn instance-dependent attention patterns, by devising a lightweight connectivity predictor module to estimate the connectivity score of each pair of tokens. Intuitively, two tokens have high connectivity scores if the features are considered relevant either spatially or semantically. As each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lim142857/Sparsifiner
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning