Joint Token Pruning and Squeezing Towards More Aggressive Compression of   Vision Transformers

Siyuan Wei; Tianzhu Ye; Shen Zhang; Yao Tang; Jiajun Liang

arXiv:2304.10716·cs.CV·April 24, 2023·1 cites

Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers

Siyuan Wei, Tianzhu Ye, Shen Zhang, Yao Tang, Jiajun Liang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a joint token pruning and squeezing method for vision transformers that significantly reduces computational costs while maintaining or improving accuracy, outperforming state-of-the-art techniques.

Contribution

The novel TPS module combines token pruning with information squeezing, enhancing efficiency and robustness in vision transformer compression.

Findings

01

Outperforms state-of-the-art methods across all pruning levels.

02

Improves accuracy by 1%-6% when reducing models to 35% of original size.

03

Enhances throughput and robustness in various transformer models.

Abstract

Although vision transformers (ViTs) have shown promising results in various computer vision tasks recently, their high computational cost limits their practical applications. Previous approaches that prune redundant tokens have demonstrated a good trade-off between performance and computation costs. Nevertheless, errors caused by pruning strategies can lead to significant information loss. Our quantitative experiments reveal that the impact of pruned tokens on performance should be noticeable. To address this issue, we propose a novel joint Token Pruning & Squeezing module (TPS) for compressing vision transformers with higher efficiency. Firstly, TPS adopts pruning to get the reserved and pruned subsets. Secondly, TPS squeezes the information of pruned tokens into partial reserved tokens via the unidirectional nearest-neighbor matching and similarity-based fusing steps. Compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

megvii-research/tps-cvpr2023
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsPruning