Strategic Scaling of Test-Time Compute: A Bandit Learning Approach

Bowen Zuo; Yinglun Zhu

arXiv:2506.12721·cs.AI·April 24, 2026

Strategic Scaling of Test-Time Compute: A Bandit Learning Approach

Bowen Zuo, Yinglun Zhu

PDF

1 Video

TL;DR

This paper introduces a bandit learning approach for adaptive test-time compute allocation in large language models, improving efficiency by focusing compute on more challenging queries.

Contribution

It formulates test-time compute allocation as a bandit problem and develops algorithms that dynamically prioritize queries based on difficulty, outperforming uniform strategies.

Findings

01

Achieves up to 11.10% performance improvement on math benchmarks.

02

Reduces unnecessary compute on unsolvable queries.

03

Empirically validates improved efficiency on multiple datasets.

Abstract

Scaling test-time compute has emerged as an effective strategy for improving the performance of large language models. However, existing methods typically allocate compute uniformly across all queries, overlooking variation in query difficulty. To address this inefficiency, we formulate test-time compute allocation as a novel bandit learning problem and propose adaptive algorithms that estimate query difficulty on the fly and allocate compute accordingly. Compared to uniform allocation, our algorithms allocate more compute to challenging queries while maintaining accuracy on easier ones. Among challenging queries, our algorithms further learn to prioritize solvable instances, effectively reducing excessive computing on unsolvable queries. We theoretically prove that our algorithms achieve better compute efficiency than uniform allocation and empirically validate their effectiveness on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Strategic Scaling of Test-Time Compute: A Bandit Learning Approach· slideslive