AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens
Purvish Jajal, Nick John Eliopoulos, Benjamin Shiue-Hal Chou, George K. Thiruvathukal, Yung-Hsiang Lu, and James C. Davis

TL;DR
AdaPerceiver introduces a unified transformer architecture capable of adaptively adjusting depth, width, and tokens, enabling efficient deployment across diverse tasks and hardware constraints while maintaining high performance.
Contribution
It is the first transformer model with integrated adaptivity across multiple axes, supported by a joint training regime for consistent performance.
Findings
Achieves higher accuracy-throughput trade-offs on image classification.
Reduces FLOPs significantly in dense prediction tasks while maintaining accuracy.
Maintains ImageNet accuracy with substantial FLOPs reduction.
Abstract
Modern transformer architectures achieve remarkable performance across tasks and domains but remain rigid in how they allocate computation at inference time. Real-world deployment often requires models to adapt to diverse hardware and latency constraints, yet most approaches to dynamic computation focus on a single axis -- such as reducing the number of tokens. We present a novel capability: AdaPerceiver, the first transformer architecture with unified adaptivity across depth, width, and tokens within a single model. We propose an architecture that supports adaptivity along these axes. We couple this with an efficient joint training regime that ensures the model maintains performance across its various configurations. We evaluate AdaPerceiver on image classification, semantic segmentation, and depth estimation tasks. On image classification, AdaPerceiver expands the accuracy-throughput…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Vision and Imaging · Video Coding and Compression Technologies
