Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation
Quan Tang, Bowen Zhang, Jiajun Liu, Fagui Liu, Yifan Liu

TL;DR
This paper introduces a Dynamic Token Pruning method for vision transformers in semantic segmentation, reducing computational costs by 20-35% through early token exit based on difficulty, inspired by human coarse-to-fine segmentation.
Contribution
It proposes a novel dynamic token pruning approach that allows early exit of easy tokens in vision transformers for semantic segmentation, maintaining accuracy while reducing computation.
Findings
Reduces computational cost by 20-35% on average.
Maintains segmentation accuracy despite pruning.
Operates dynamically based on input difficulty.
Abstract
Vision transformers have achieved leading performance on various visual tasks yet still suffer from high computational complexity. The situation deteriorates in dense prediction tasks like semantic segmentation, as high-resolution inputs and outputs usually imply more tokens involved in computations. Directly removing the less attentive tokens has been discussed for the image classification task but can not be extended to semantic segmentation since a dense prediction is required for every patch. To this end, this work introduces a Dynamic Token Pruning (DToP) method based on the early exit of tokens for semantic segmentation. Motivated by the coarse-to-fine segmentation process by humans, we naturally split the widely adopted auxiliary-loss-based network architecture into several stages, where each auxiliary block grades every token's difficulty level. We can finalize the prediction of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning
MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance · Pruning
