Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference

Ting Liu; Xuyang Liu; Liangtao Shi; Zunnan Xu; Yue Hu; Siteng Huang; Yi Xin; Bineng Zhong; Donglin Wang

arXiv:2405.14700·cs.CV·December 19, 2025

Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference

Ting Liu, Xuyang Liu, Liangtao Shi, Zunnan Xu, Yue Hu, Siteng Huang, Yi Xin, Bineng Zhong, Donglin Wang

PDF

Open Access 1 Repo

TL;DR

Sparse-Tuning introduces a novel framework combining token sparsification and dense adapters to efficiently fine-tune Vision Transformers, significantly reducing computational costs while maintaining high performance across image and video tasks.

Contribution

It proposes a new method that enhances PEFT by improving inference efficiency through token sparsification and information compensation with dense adapters.

Findings

01

Reduces GFLOPs to 66% of original ViT-B

02

Achieves state-of-the-art performance on multiple datasets

03

Maintains performance with significantly less computation

Abstract

Parameter-efficient fine-tuning (PEFT) has emerged as a popular solution for adapting pre-trained Vision Transformer (ViT) models to downstream applications by updating only a small subset of parameters. While current PEFT methods have achieved fine-tuning efficiency, they overlook the efficiency of computation and GPU memory during inference, falling short of practical requirements. To address this limitation, we propose Sparse-Tuning, an efficient and effective framework that leverages popular token sparsification (TS) techniques to reduce information redundancy in images and videos, thereby significantly improving computational and memory efficiency. However, TS often compromises performance due to inevitable information loss. To address this limitation, we further introduce Dense Adapters (DA) to compensate for the information losses incurred by token sparsification. DA integrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liuting20/sparse-tuning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · CCD and CMOS Imaging Sensors · Color Science and Applications

MethodsAttention Is All You Need · ALIGN · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings · Dropout · Softmax