Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning
Matthias Otth, Jonas H\"ubotter, Ido Hakimi, Andreas Krause

TL;DR
This paper demonstrates that test-time prefix-confidence scaling significantly enhances mathematical reasoning accuracy in language models by selectively continuing the most promising attempts, outperforming majority voting and reducing length biases.
Contribution
The study introduces and systematically evaluates prefix-confidence scaling at test time for mathematical reasoning, showing its effectiveness over traditional methods and its robustness against length biases.
Findings
Prefix-confidence scaling improves accuracy-compute trade-off.
It is less susceptible to length biases than majority voting.
Test-time training with prefix-confidence outperforms the base model but not prefix-confidence scaling.
Abstract
Recent work has shown that language models can self-improve by maximizing their own confidence in their predictions, without relying on external verifiers or reward signals. In this work, we study the test-time scaling of language models for mathematical reasoning tasks, where the model's own confidence is used to select the most promising attempts. Surprisingly, we find that we can achieve significant performance gains by continuing only the most promising attempt, selected by the model's prefix-confidence. We systematically evaluate prefix-confidence scaling on five mathematical reasoning datasets: the school-level GSM8K and MATH500, and the competition-level AMC23, AIME24, and AIME25. We find that prefix-confidence scaling with prefixes of only 32 tokens achieves a better accuracy-compute trade-off than majority voting. Moreover, prefix-confidence scaling appears less susceptible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Mathematics, Computing, and Information Processing
