Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully   Exploiting Self-Attention

Xiangcheng Liu; Tianyi Wu; Guodong Guo

arXiv:2209.13802·cs.CV·July 7, 2023·1 cites

Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention

Xiangcheng Liu, Tianyi Wu, Guodong Guo

PDF

Open Access 1 Repo

TL;DR

This paper introduces an adaptive token pruning method for Vision Transformers that dynamically discards unimportant tokens, significantly improving inference speed with minimal accuracy loss.

Contribution

It proposes a learnable, threshold-based token pruning framework that adaptively balances accuracy and computational complexity during inference.

Findings

01

Increases DeiT-S throughput by 50%

02

Maintains top-1 accuracy with only 0.2% drop

03

Outperforms previous pruning methods in accuracy-latency trade-off

Abstract

Vision transformer has emerged as a new paradigm in computer vision, showing excellent performance while accompanied by expensive computational cost. Image token pruning is one of the main approaches for ViT compression, due to the facts that the complexity is quadratic with respect to the token number, and many tokens containing only background regions do not truly contribute to the final prediction. Existing works either rely on additional modules to score the importance of individual tokens, or implement a fixed ratio pruning strategy for different input instances. In this work, we propose an adaptive sparse token pruning framework with a minimal cost. Specifically, we firstly propose an inexpensive attention head importance weighted class attention scoring mechanism. Then, learnable parameters are inserted as thresholds to distinguish informative tokens from unimportant ones. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cydia2018/as-vit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques

MethodsPruning · Class Attention