Multiple-Exit Tuning: Towards Inference-Efficient Adaptation for Vision Transformer
Zheng Liu, Jinchao Zhu, Nannan Li, Gao Huang

TL;DR
This paper introduces multiple-exit tuning (MET), a method for vision transformers that improves inference efficiency by allowing easy samples to exit early, reducing computational costs while maintaining high accuracy.
Contribution
The paper proposes a novel multiple-exit tuning approach with shared adapters and graph regularization, enabling efficient inference in vision transformers.
Findings
MET outperforms state-of-the-art methods in accuracy.
MET significantly reduces inference computational cost.
Early exits improve efficiency without sacrificing performance.
Abstract
Parameter-efficient transfer learning (PETL) has shown great potential in adapting a vision transformer (ViT) pre-trained on large-scale datasets to various downstream tasks. Existing studies primarily focus on minimizing the number of learnable parameters. Although these methods are storage-efficient, they allocate excessive computational resources to easy samples, leading to inefficient inference. To address this issue, we introduce an inference-efficient tuning method termed multiple-exit tuning (MET). MET integrates multiple exits into the pre-trained ViT backbone. Since the predictions in ViT are made by a linear classifier, each exit is equipped with a linear prediction head. In inference stage, easy samples will exit at early exits and only hard enough samples will flow to the last exit, thus saving the computational cost for easy samples. MET consists of exit-specific adapters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Neural Networks and Reservoir Computing · Advanced Memory and Neural Computing
MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Multi-Head Attention · Layer Normalization · Residual Connection · Vision Transformer · Focus
