Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

TL;DR
This paper introduces a comprehensive end-to-end approach for sparsifying vision transformers, reducing training and inference costs while maintaining or even improving accuracy through dynamic subnetwork extraction and joint optimization.
Contribution
It presents the first unified method to integrate both unstructured and structured sparsity in ViTs, including a novel learnable token selector for adaptive patch importance.
Findings
Significant reduction in computational cost with minimal accuracy loss.
Sparse training can sometimes enhance ViT accuracy.
Effective sparsification across diverse ViT architectures on ImageNet.
Abstract
Vision transformers (ViTs) have recently received explosive popularity, but their enormous model sizes and training costs remain daunting. Conventional post-training pruning often incurs higher training budgets. In contrast, this paper aims to trim down both the training memory overhead and the inference complexity, without sacrificing the achievable accuracy. We carry out the first-of-its-kind comprehensive exploration, on taking a unified approach of integrating sparsity in ViTs "from end to end". Specifically, instead of training full ViTs, we dynamically extract and train sparse subnetworks, while sticking to a fixed small parameter budget. Our approach jointly optimizes model parameters and explores connectivity throughout training, ending up with one sparse network as the final output. The approach is seamlessly extended from unstructured to structured sparsity, the latter by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Memory and Neural Computing
MethodsPruning
