Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs

Mohammad Ali Alomrani; Yingxue Zhang; Derek Li; Qianyi Sun; Soumyasundar Pal; Zhanguang Zhang; Yaochen Hu; Rohan Deepak Ajwani; Antonios Valkanas; Raika Karimi; Peng Cheng; Yunzhou Wang; Pengyi Liao; Hanrui Huang; Bin Wang; Jianye Hao; Mark Coates

arXiv:2507.02076·cs.AI·July 4, 2025

Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs

Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates

PDF

TL;DR

This survey reviews strategies for adaptive and controllable test-time compute in large language models, focusing on improving reasoning efficiency through dynamic inference control and scalability.

Contribution

It introduces a two-tiered taxonomy of TTC methods, benchmarks leading LLMs, and discusses future directions for efficient, robust, and user-responsive reasoning.

Findings

01

Trade-offs between reasoning accuracy and token efficiency identified

02

L2-adaptive methods outperform fixed compute strategies in diverse tasks

03

Emerging hybrid models show promise for scalable reasoning

Abstract

Large language models (LLMs) have rapidly progressed into general-purpose agents capable of solving a broad spectrum of tasks. However, current models remain inefficient at reasoning: they apply fixed inference-time compute regardless of task complexity, often overthinking simple problems while underthinking hard ones. This survey presents a comprehensive review of efficient test-time compute (TTC) strategies, which aim to improve the computational efficiency of LLM reasoning. We introduce a two-tiered taxonomy that distinguishes between L1-controllability, methods that operate under fixed compute budgets, and L2-adaptiveness, methods that dynamically scale inference based on input difficulty or model confidence. We benchmark leading proprietary LLMs across diverse datasets, highlighting critical trade-offs between reasoning performance and token usage. Compared to prior surveys on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.