Adaptive Test-Time Compute Allocation with Evolving In-Context Demonstrations
Bowen Zuo, Dongruo Zhou, Yinglun Zhu

TL;DR
This paper presents a novel test-time compute allocation method that dynamically adapts computation and generation strategies based on query difficulty and evolving in-context demonstrations, improving efficiency and performance.
Contribution
It introduces a framework that jointly adapts compute allocation and generation, using a warm-up and adaptive phase with evolving demonstrations to enhance test-time inference.
Findings
Outperforms existing baselines across math, coding, and reasoning tasks.
Consumes less inference compute while maintaining or improving accuracy.
Effectively identifies easy queries and focuses resources on unresolved ones.
Abstract
While scaling test-time compute can substantially improve model performance, existing approaches either rely on static compute allocation or sample from fixed generation distributions. In this work, we introduce a test-time compute allocation framework that jointly adapts where computation is spent and how generation is performed. Our method begins with a warm-up phase that identifies easy queries and assembles an initial pool of question-response pairs from the test set itself. An adaptive phase then concentrates further computation on unresolved queries while reshaping their generation distributions through evolving in-context demonstrations -- conditioning each generation on successful responses from semantically related queries rather than resampling from a fixed distribution. Experiments across math, coding, and reasoning benchmarks demonstrate that our approach consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
