CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning
Siye Wu, Jian Xie, Yikai Zhang, Yanghua Xiao

TL;DR
CODA introduces a dynamic compute allocation method for reasoning models that adjusts reasoning depth based on instance difficulty, reducing costs on simple tasks and enhancing performance on complex ones.
Contribution
The paper formalizes adaptive reasoning as a utility maximization problem and proposes CODA, a novel difficulty-aware compute allocation method that operates without external annotations.
Findings
CODA reduces token costs by over 60% on easy tasks while maintaining accuracy.
On hard tasks, CODA encourages more deliberative reasoning to improve performance.
CODA achieves adaptive reasoning across different model scales and benchmarks.
Abstract
The emergence of large reasoning models demonstrates that scaling inference-time compute significantly enhances performance on complex tasks. However, it often falls into another trap: overthinking simple problems, where repetitive rationales yield minimal accuracy gains at a disproportionately high cost. This motivates adaptive reasoning: dynamically aligning reasoning depth with instance difficulty. In this paper, we study adaptive reasoning from an optimality perspective, formalizing it as a utility maximization problem where tokens are allocated until the marginal accuracy gain falls below the incremental cost. Based on this, we propose CODA (Compute Allocation by Difficulty Awareness), a method that operationalizes this principle by allocating tokens via a policy-internal difficulty signal. Specifically, CODA estimates difficulty via group-based rollouts and maps it to two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
