AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens

Purvish Jajal; Nick John Eliopoulos; Benjamin Shiue-Hal Chou; George K. Thiruvathukal; Yung-Hsiang Lu; and James C. Davis

arXiv:2511.18105·cs.CV·November 25, 2025

AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens

Purvish Jajal, Nick John Eliopoulos, Benjamin Shiue-Hal Chou, George K. Thiruvathukal, Yung-Hsiang Lu, and James C. Davis

PDF

Open Access 2 Models

TL;DR

AdaPerceiver introduces a unified transformer architecture capable of adaptively adjusting depth, width, and tokens, enabling efficient deployment across diverse tasks and hardware constraints while maintaining high performance.

Contribution

It is the first transformer model with integrated adaptivity across multiple axes, supported by a joint training regime for consistent performance.

Findings

01

Achieves higher accuracy-throughput trade-offs on image classification.

02

Reduces FLOPs significantly in dense prediction tasks while maintaining accuracy.

03

Maintains ImageNet accuracy with substantial FLOPs reduction.

Abstract

Modern transformer architectures achieve remarkable performance across tasks and domains but remain rigid in how they allocate computation at inference time. Real-world deployment often requires models to adapt to diverse hardware and latency constraints, yet most approaches to dynamic computation focus on a single axis -- such as reducing the number of tokens. We present a novel capability: AdaPerceiver, the first transformer architecture with unified adaptivity across depth, width, and tokens within a single model. We propose an architecture that supports adaptivity along these axes. We couple this with an efficient joint training regime that ensures the model maintains performance across its various configurations. We evaluate AdaPerceiver on image classification, semantic segmentation, and depth estimation tasks. On image classification, AdaPerceiver expands the accuracy-throughput…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Vision and Imaging · Video Coding and Compression Technologies