Understanding Dynamic Compute Allocation in Recurrent Transformers
Ibraheem Muhammad Moosa, Suhas Lohit, Ye Wang, Moitreya Chatterjee, Wenpeng Yin

TL;DR
This paper introduces a new evaluation paradigm and a recurrent Transformer framework to analyze token-level adaptive computation, revealing that models can align compute with complexity but struggle with generalization to unseen input sizes.
Contribution
It presents a complexity-controlled evaluation method, the ANIRA framework for variable-depth computation, and a systematic analysis of adaptive compute allocation in recurrent Transformers.
Findings
Compute allocation can align with task complexity without supervision.
Models fail to generalize to larger input sizes despite adaptive computation.
Early compute decisions rely on static cues, while online halting tracks execution state.
Abstract
Token-level adaptive computation seeks to reduce inference cost by allocating more computation to harder tokens and less to easier ones. However, prior work is primarily evaluated on natural-language benchmarks using task-level metrics, where token-level difficulty is unobservable and confounded with architectural factors, making it unclear whether compute allocation truly aligns with underlying complexity. We address this gap through three contributions. First, we introduce a complexity-controlled evaluation paradigm using algorithmic and synthetic language tasks with parameterized difficulty, enabling direct testing of token-level compute allocation. Second, we propose ANIRA, a unified recurrent Transformer framework that supports per-token variable-depth computation while isolating compute allocation decisions from other model factors. Third, we use this framework to conduct a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Parallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices
