Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
Ting Liu, Xuyang Liu, Liangtao Shi, Zunnan Xu, Yue Hu, Siteng Huang, Yi Xin, Bineng Zhong, Donglin Wang

TL;DR
Sparse-Tuning introduces a novel framework combining token sparsification and dense adapters to efficiently fine-tune Vision Transformers, significantly reducing computational costs while maintaining high performance across image and video tasks.
Contribution
It proposes a new method that enhances PEFT by improving inference efficiency through token sparsification and information compensation with dense adapters.
Findings
Reduces GFLOPs to 66% of original ViT-B
Achieves state-of-the-art performance on multiple datasets
Maintains performance with significantly less computation
Abstract
Parameter-efficient fine-tuning (PEFT) has emerged as a popular solution for adapting pre-trained Vision Transformer (ViT) models to downstream applications by updating only a small subset of parameters. While current PEFT methods have achieved fine-tuning efficiency, they overlook the efficiency of computation and GPU memory during inference, falling short of practical requirements. To address this limitation, we propose Sparse-Tuning, an efficient and effective framework that leverages popular token sparsification (TS) techniques to reduce information redundancy in images and videos, thereby significantly improving computational and memory efficiency. However, TS often compromises performance due to inevitable information loss. To address this limitation, we further introduce Dense Adapters (DA) to compensate for the information losses incurred by token sparsification. DA integrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · CCD and CMOS Imaging Sensors · Color Science and Applications
MethodsAttention Is All You Need · ALIGN · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings · Dropout · Softmax
