Dynamic Token Pruning in Plain Vision Transformers for Semantic   Segmentation

Quan Tang; Bowen Zhang; Jiajun Liu; Fagui Liu; Yifan Liu

arXiv:2308.01045·cs.CV·September 29, 2023

Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation

Quan Tang, Bowen Zhang, Jiajun Liu, Fagui Liu, Yifan Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Dynamic Token Pruning method for vision transformers in semantic segmentation, reducing computational costs by 20-35% through early token exit based on difficulty, inspired by human coarse-to-fine segmentation.

Contribution

It proposes a novel dynamic token pruning approach that allows early exit of easy tokens in vision transformers for semantic segmentation, maintaining accuracy while reducing computation.

Findings

01

Reduces computational cost by 20-35% on average.

02

Maintains segmentation accuracy despite pruning.

03

Operates dynamically based on input difficulty.

Abstract

Vision transformers have achieved leading performance on various visual tasks yet still suffer from high computational complexity. The situation deteriorates in dense prediction tasks like semantic segmentation, as high-resolution inputs and outputs usually imply more tokens involved in computations. Directly removing the less attentive tokens has been discussed for the image classification task but can not be extended to semantic segmentation since a dense prediction is required for every patch. To this end, this work introduces a Dynamic Token Pruning (DToP) method based on the early exit of tokens for semantic segmentation. Motivated by the coarse-to-fine segmentation process by humans, we naturally split the widely adopted auxiliary-loss-based network architecture into several stages, where each auxiliary block grades every token's difficulty level. We can finalize the prediction of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zbwxp/Dynamic-Token-Pruning
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning

MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance · Pruning