SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
Xuanyao Chen, Zhijian Liu, Haotian Tang, Li Yi, Hang Zhao, Song Han

TL;DR
SparseViT leverages activation sparsity in window-based vision transformers to achieve significant speedups in high-resolution image processing while maintaining accuracy, through layer-specific pruning and evolutionary search.
Contribution
The paper introduces a novel activation sparsity approach for window-based ViTs, enabling actual speedup via layerwise pruning and adaptive sparsity optimization.
Findings
~50% latency reduction with 60% sparsity
Achieves 1.3x to 1.5x speedups in various vision tasks
Maintains accuracy with negligible loss
Abstract
High-resolution images enable neural networks to learn richer visual representations. However, this improved performance comes at the cost of growing computational complexity, hindering their usage in latency-sensitive applications. As not all pixels are equal, skipping computations for less-important regions offers a simple and effective measure to reduce the computation. This, however, is hard to be translated into actual speedup for CNNs since it breaks the regularity of the dense convolution workload. In this paper, we introduce SparseViT that revisits activation sparsity for recent window-based vision transformers (ViTs). As window attentions are naturally batched over blocks, actual speedup with window activation pruning becomes possible: i.e., ~50% latency reduction with 60% sparsity. Different layers should be assigned with different pruning ratios due to their diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Image Enhancement Techniques · Advanced Neural Network Applications
MethodsPruning · Convolution
