SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution   Vision Transformer

Xuanyao Chen; Zhijian Liu; Haotian Tang; Li Yi; Hang Zhao; Song Han

arXiv:2303.17605·cs.CV·March 31, 2023·1 cites

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

Xuanyao Chen, Zhijian Liu, Haotian Tang, Li Yi, Hang Zhao, Song Han

PDF

Open Access 1 Repo

TL;DR

SparseViT leverages activation sparsity in window-based vision transformers to achieve significant speedups in high-resolution image processing while maintaining accuracy, through layer-specific pruning and evolutionary search.

Contribution

The paper introduces a novel activation sparsity approach for window-based ViTs, enabling actual speedup via layerwise pruning and adaptive sparsity optimization.

Findings

01

~50% latency reduction with 60% sparsity

02

Achieves 1.3x to 1.5x speedups in various vision tasks

03

Maintains accuracy with negligible loss

Abstract

High-resolution images enable neural networks to learn richer visual representations. However, this improved performance comes at the cost of growing computational complexity, hindering their usage in latency-sensitive applications. As not all pixels are equal, skipping computations for less-important regions offers a simple and effective measure to reduce the computation. This, however, is hard to be translated into actual speedup for CNNs since it breaks the regularity of the dense convolution workload. In this paper, we introduce SparseViT that revisits activation sparsity for recent window-based vision transformers (ViTs). As window attentions are naturally batched over blocks, actual speedup with window activation pruning becomes possible: i.e., ~50% latency reduction with 60% sparsity. Different layers should be assigned with different pruning ratios due to their diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mit-han-lab/sparsevit
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Image Enhancement Techniques · Advanced Neural Network Applications

MethodsPruning · Convolution