Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning
Feng Chen, Allan Raventos, Nan Cheng, Surya Ganguli, Shaul Druckmann

TL;DR
This paper investigates how limiting model confidence during training can improve large language models' mathematical reasoning performance when using test-time compute strategies like pass@N, revealing the importance of co-designing training and inference.
Contribution
It introduces a modified training loss that reduces overconfidence, aligning training with pass@N, and demonstrates improved reasoning performance on math benchmarks.
Findings
Overconfidence from cross-entropy loss impairs pass@N accuracy.
Limiting confidence during training enhances mathematical reasoning.
Modified loss improves performance on MATH and MiniF2F benchmarks.
Abstract
Recent progress in large language models (LLMs) highlights the power of scaling test-time compute to achieve strong performance on complex tasks, such as mathematical reasoning and code generation. This raises a critical question: how should model training be modified to optimize performance under a subsequent test-time compute strategy and budget? To explore this, we focus on pass@N, a simple test-time strategy that searches for a correct answer in independent samples. We show, surprisingly, that training with cross-entropy (CE) loss can be with pass@N in that pass@N accuracy with longer training. We explain the origins of this misalignment in terms of model overconfidence induced by CE, and experimentally verify our prediction of overconfidence as an impediment to scaling test-time compute via pass@N. Furthermore we suggest a principled,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Computability, Logic, AI Algorithms · Neural Networks and Applications
MethodsFocus
