Chasing Sparsity in Vision Transformers: An End-to-End Exploration

Tianlong Chen; Yu Cheng; Zhe Gan; Lu Yuan; Lei Zhang; Zhangyang Wang

arXiv:2106.04533·cs.CV·October 26, 2021·85 cites

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a comprehensive end-to-end approach for sparsifying vision transformers, reducing training and inference costs while maintaining or even improving accuracy through dynamic subnetwork extraction and joint optimization.

Contribution

It presents the first unified method to integrate both unstructured and structured sparsity in ViTs, including a novel learnable token selector for adaptive patch importance.

Findings

01

Significant reduction in computational cost with minimal accuracy loss.

02

Sparse training can sometimes enhance ViT accuracy.

03

Effective sparsification across diverse ViT architectures on ImageNet.

Abstract

Vision transformers (ViTs) have recently received explosive popularity, but their enormous model sizes and training costs remain daunting. Conventional post-training pruning often incurs higher training budgets. In contrast, this paper aims to trim down both the training memory overhead and the inference complexity, without sacrificing the achievable accuracy. We carry out the first-of-its-kind comprehensive exploration, on taking a unified approach of integrating sparsity in ViTs "from end to end". Specifically, instead of training full ViTs, we dynamically extract and train sparse subnetworks, while sticking to a fixed small parameter budget. Our approach jointly optimizes model parameters and explores connectivity throughout training, ending up with one sparse network as the final output. The approach is seamlessly extended from unstructured to structured sparsity, the latter by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VITA-Group/SViTE
pytorchOfficial

Videos

Chasing Sparsity in Vision Transformers: An End-to-End Exploration· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Memory and Neural Computing

MethodsPruning