Multiple-Exit Tuning: Towards Inference-Efficient Adaptation for Vision   Transformer

Zheng Liu; Jinchao Zhu; Nannan Li; Gao Huang

arXiv:2409.13999·cs.CV·September 24, 2024

Multiple-Exit Tuning: Towards Inference-Efficient Adaptation for Vision Transformer

Zheng Liu, Jinchao Zhu, Nannan Li, Gao Huang

PDF

Open Access

TL;DR

This paper introduces multiple-exit tuning (MET), a method for vision transformers that improves inference efficiency by allowing easy samples to exit early, reducing computational costs while maintaining high accuracy.

Contribution

The paper proposes a novel multiple-exit tuning approach with shared adapters and graph regularization, enabling efficient inference in vision transformers.

Findings

01

MET outperforms state-of-the-art methods in accuracy.

02

MET significantly reduces inference computational cost.

03

Early exits improve efficiency without sacrificing performance.

Abstract

Parameter-efficient transfer learning (PETL) has shown great potential in adapting a vision transformer (ViT) pre-trained on large-scale datasets to various downstream tasks. Existing studies primarily focus on minimizing the number of learnable parameters. Although these methods are storage-efficient, they allocate excessive computational resources to easy samples, leading to inefficient inference. To address this issue, we introduce an inference-efficient tuning method termed multiple-exit tuning (MET). MET integrates multiple exits into the pre-trained ViT backbone. Since the predictions in ViT are made by a linear classifier, each exit is equipped with a linear prediction head. In inference stage, easy samples will exit at early exits and only hard enough samples will flow to the last exit, thus saving the computational cost for easy samples. MET consists of exit-specific adapters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Neural Networks and Reservoir Computing · Advanced Memory and Neural Computing

MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Multi-Head Attention · Layer Normalization · Residual Connection · Vision Transformer · Focus