Exploring Token Pruning in Vision State Space Models

Zheng Zhan; Zhenglun Kong; Yifan Gong; Yushu Wu; Zichong Meng; Hangyu; Zheng; Xuan Shen; Stratis Ioannidis; Wei Niu; Pu Zhao; Yanzhi Wang

arXiv:2409.18962·cs.CV·September 30, 2024

Exploring Token Pruning in Vision State Space Models

Zheng Zhan, Zhenglun Kong, Yifan Gong, Yushu Wu, Zichong Meng, Hangyu, Zheng, Xuan Shen, Stratis Ioannidis, Wei Niu, Pu Zhao, Yanzhi Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel token pruning method tailored for Vision State Space Models, significantly reducing computation while maintaining high accuracy, and providing insights into SSM-based vision models.

Contribution

The paper proposes a new token pruning approach specifically designed for SSM-based vision models, addressing performance issues of naive pruning methods and enhancing efficiency.

Findings

01

Achieved 81.7% accuracy on ImageNet with 41.6% FLOPs reduction.

02

Demonstrated significant computational savings with minimal performance loss.

03

Provided new insights into the behavior of SSM-based vision models.

Abstract

State Space Models (SSMs) have the advantage of keeping linear computational complexity compared to attention modules in transformers, and have been applied to vision tasks as a new type of powerful vision foundation model. Inspired by the observations that the final prediction in vision transformers (ViTs) is only based on a subset of most informative tokens, we take the novel step of enhancing the efficiency of SSM-based vision models through token-based pruning. However, direct applications of existing token pruning techniques designed for ViTs fail to deliver good performance, even with extensive fine-tuning. To address this issue, we revisit the unique computational characteristics of SSMs and discover that naive application disrupts the sequential token positions. This insight motivates us to design a novel and general token pruning method specifically for SSM-based vision models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Exploring Token Pruning in Vision State Space Models· slideslive

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques

MethodsSoftmax · Attention Is All You Need · Pruning