Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kiant\'e Brantley, Wen Sun

TL;DR
This paper introduces a value-guided search method for long-context reasoning that improves efficiency and performance without needing detailed step annotations, using a large dataset and a token-level value model.
Contribution
It presents a novel value-guided search approach that does not rely on step annotations, trained on a large dataset, enhancing reasoning efficiency and scalability.
Findings
VGS outperforms standard voting methods in test-time scaling.
VGS reduces inference FLOPs while maintaining performance.
The dataset, model, and code are publicly available.
Abstract
In this paper, we propose a simple and efficient method for value model training on long-context reasoning traces. Compared to existing process reward models (PRMs), our method does not require a fine-grained notion of "step," which is difficult to define for long-context reasoning models. By collecting a dataset of 2.5 million reasoning traces, we train a 1.5B token-level value model and apply it to DeepSeek models for improved performance with test-time compute scaling. We find that block-wise value-guided search (VGS) with a final weighted majority vote achieves better test-time scaling than standard methods such as majority voting or best-of-n. Moreover, VGS significantly reduces the inference FLOPs required to achieve the same performance of majority voting. Our dataset, model and codebase are open-sourced.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
