TL;DR
This paper introduces a bandit learning approach for adaptive test-time compute allocation in large language models, improving efficiency by focusing compute on more challenging queries.
Contribution
It formulates test-time compute allocation as a bandit problem and develops algorithms that dynamically prioritize queries based on difficulty, outperforming uniform strategies.
Findings
Achieves up to 11.10% performance improvement on math benchmarks.
Reduces unnecessary compute on unsolvable queries.
Empirically validates improved efficiency on multiple datasets.
Abstract
Scaling test-time compute has emerged as an effective strategy for improving the performance of large language models. However, existing methods typically allocate compute uniformly across all queries, overlooking variation in query difficulty. To address this inefficiency, we formulate test-time compute allocation as a novel bandit learning problem and propose adaptive algorithms that estimate query difficulty on the fly and allocate compute accordingly. Compared to uniform allocation, our algorithms allocate more compute to challenging queries while maintaining accuracy on easier ones. Among challenging queries, our algorithms further learn to prioritize solvable instances, effectively reducing excessive computing on unsolvable queries. We theoretically prove that our algorithms achieve better compute efficiency than uniform allocation and empirically validate their effectiveness on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
