Transformer^-1: Input-Adaptive Computation for Resource-Constrained Deployment
Lumen AI, Tengzhou No. 1 Middle School, Shihao Ji, Zihui Song, Fucheng, Zhong, Jisen Jia, Zhaobo Wu, Zheyi Cao, Xu Tianhao

TL;DR
This paper introduces Transformer^-1, an adaptive architecture that dynamically allocates computational resources based on input complexity, significantly reducing FLOPs and memory usage while maintaining accuracy, suitable for resource-constrained deployment.
Contribution
It presents a novel input-adaptive Transformer architecture with a control mechanism, theoretical efficiency bounds, and engineering solutions, enabling efficient deployment in resource-limited environments.
Findings
42.7% reduction in FLOPs on ImageNet-1K
34.1% decrease in peak memory usage
Maintains accuracy within ±0.3% of standard Transformer
Abstract
Addressing the resource waste caused by fixed computation paradigms in deep learning models under dynamic scenarios, this paper proposes a Transformer architecture based on the principle of deep adaptivity. This architecture achieves dynamic matching between input features and computational resources by establishing a joint optimization model for complexity and computation. Our core contributions include: (1) designing a two-layer control mechanism, composed of a complexity predictor and a reinforcement learning policy network, enabling end-to-end optimization of computation paths; (2) deriving a lower bound theory for dynamic computation, proving the system's theoretical reach to optimal efficiency; and (3) proposing a layer folding technique and a CUDA Graph pre-compilation scheme, overcoming the engineering bottlenecks of dynamic architectures. In the ImageNet-1K benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Real-Time Systems Scheduling
MethodsAttention Is All You Need · Softmax · Adam · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer
