Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning

Matthias Otth; Jonas H\"ubotter; Ido Hakimi; Andreas Krause

arXiv:2507.18122·cs.LG·July 25, 2025

Maximizing Prefix-Confidence at Test-Time Efficiently Improves Mathematical Reasoning

Matthias Otth, Jonas H\"ubotter, Ido Hakimi, Andreas Krause

PDF

Open Access

TL;DR

This paper demonstrates that test-time prefix-confidence scaling significantly enhances mathematical reasoning accuracy in language models by selectively continuing the most promising attempts, outperforming majority voting and reducing length biases.

Contribution

The study introduces and systematically evaluates prefix-confidence scaling at test time for mathematical reasoning, showing its effectiveness over traditional methods and its robustness against length biases.

Findings

01

Prefix-confidence scaling improves accuracy-compute trade-off.

02

It is less susceptible to length biases than majority voting.

03

Test-time training with prefix-confidence outperforms the base model but not prefix-confidence scaling.

Abstract

Recent work has shown that language models can self-improve by maximizing their own confidence in their predictions, without relying on external verifiers or reward signals. In this work, we study the test-time scaling of language models for mathematical reasoning tasks, where the model's own confidence is used to select the most promising attempts. Surprisingly, we find that we can achieve significant performance gains by continuing only the most promising attempt, selected by the model's prefix-confidence. We systematically evaluate prefix-confidence scaling on five mathematical reasoning datasets: the school-level GSM8K and MATH500, and the competition-level AMC23, AIME24, and AIME25. We find that prefix-confidence scaling with prefixes of only 32 tokens achieves a better accuracy-compute trade-off than majority voting. Moreover, prefix-confidence scaling appears less susceptible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Mathematics, Computing, and Information Processing