Limits of PRM-Guided Tree Search for Mathematical Reasoning with LLMs

Tristan Cinquin; Geoff Pleiss; Agustinus Kristiadi

arXiv:2510.20272·cs.LG·October 24, 2025

Limits of PRM-Guided Tree Search for Mathematical Reasoning with LLMs

Tristan Cinquin, Geoff Pleiss, Agustinus Kristiadi

PDF

Open Access

TL;DR

This paper investigates whether PRM-guided tree search can improve mathematical reasoning in LLMs, finding that it does not outperform chain-of-thought prompting due to unreliable reward models and the complexity of reasoning.

Contribution

The study introduces an adaptive PRM-guided tree search algorithm and systematically evaluates its effectiveness across diverse mathematical problems, revealing its limitations.

Findings

01

PRM-guided tree search shows no significant improvement over BoN.

02

Monte Carlo and beam search outperform other PRM-guided methods.

03

PRMs poorly estimate state values and degrade with reasoning depth.

Abstract

While chain-of-thought prompting with Best-of-N (BoN) selection has become popular for mathematical reasoning in large language models (LLMs), its linear structure fails to capture the branching and exploratory nature of complex problem-solving. In this work, we propose an adaptive algorithm to maximize process reward model (PRM) scores over the intractable action space, and investigate whether PRM-guided tree search can improve mathematical reasoning by exploring multiple partial solution paths. Across $23$ diverse mathematical problems using Qwen2.5-Math-7B-Instruct with its associated PRM as a case study, we find that: (1) PRM-guided tree search shows no statistically significant improvements over BoN despite higher costs, (2) Monte Carlo tree search and beam search outperform other PRM-guided tree search methods, (3) PRMs poorly approximate state values and their reliability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling