No Token Left Behind: Efficient Vision Transformer via Dynamic Token   Idling

Xuwei Xu; Changlin Li; Yudong Chen; Xiaojun Chang; Jiajun Liu; Sen; Wang

arXiv:2310.05654·cs.CV·November 7, 2023

No Token Left Behind: Efficient Vision Transformer via Dynamic Token Idling

Xuwei Xu, Changlin Li, Yudong Chen, Xiaojun Chang, Jiajun Liu, Sen, Wang

PDF

Open Access 1 Repo

TL;DR

IdleViT introduces a dynamic token idling method for Vision Transformers that reduces computational complexity by selectively keeping tokens idle, allowing re-selection in later layers, and uses a token cut loss for better token selection, achieving high efficiency with minimal accuracy loss.

Contribution

This paper proposes IdleViT, a novel dynamic token idling approach with a token cut loss, enabling efficient ViT inference without permanently dropping tokens, unlike prior pruning methods.

Findings

01

Reduces ViT complexity by up to 33% with only 0.2% accuracy loss.

02

Outperforms state-of-the-art EViT at a 0.5 keep ratio.

03

Achieves faster inference speed with minimal performance impact.

Abstract

Vision Transformers (ViTs) have demonstrated outstanding performance in computer vision tasks, yet their high computational complexity prevents their deployment in computing resource-constrained environments. Various token pruning techniques have been introduced to alleviate the high computational burden of ViTs by dynamically dropping image tokens. However, some undesirable pruning at early stages may result in permanent loss of image information in subsequent layers, consequently hindering model performance. To address this problem, we propose IdleViT, a dynamic token-idle-based method that achieves an excellent trade-off between performance and efficiency. Specifically, in each layer, IdleViT selects a subset of the image tokens to participate in computations while keeping the rest of the tokens idle and directly passing them to this layer's output. By allowing the idle tokens to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ackesnal/idlevit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · CCD and CMOS Imaging Sensors

MethodsPruning