Vision Transformer Pruning

Mingjian Zhu; Yehui Tang; Kai Han

arXiv:2104.08500·cs.CV·August 17, 2021·53 cites

Vision Transformer Pruning

Mingjian Zhu, Yehui Tang, Kai Han

PDF

Open Access 2 Repos

TL;DR

This paper introduces a pruning method for vision transformers that reduces model size and computation by identifying and removing less important dimensions, enabling efficient deployment on mobile devices.

Contribution

It proposes a novel dimension-wise sparsity regularization and pruning pipeline specifically designed for vision transformers, improving efficiency without significant accuracy loss.

Findings

01

Achieves high pruning ratios with minimal accuracy drop.

02

Reduces parameters and FLOPs effectively on ImageNet.

03

Demonstrates practical deployment benefits on mobile devices.

Abstract

Vision transformer has achieved competitive performance on a variety of computer vision applications. However, their storage, run-time memory, and computational demands are hindering the deployment to mobile devices. Here we present a vision transformer pruning approach, which identifies the impacts of dimensions in each layer of transformer and then executes pruning accordingly. By encouraging dimension-wise sparsity in the transformer, important dimensions automatically emerge. A great number of dimensions with small importance scores can be discarded to achieve a high pruning ratio without significantly compromising accuracy. The pipeline for vision transformer pruning is as follows: 1) training with sparsity regularization; 2) pruning dimensions of linear projections; 3) fine-tuning. The reduced parameters and FLOPs ratios of the proposed algorithm are well evaluated and analyzed on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection

MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Vision Transformer · Softmax · Layer Normalization · Label Smoothing