BFS-PO: Best-First Search for Large Reasoning Models

Fiorenzo Parascandolo; Wenhui Tan; Enver Sangineto; Ruihua Song; Rita Cucchiara

arXiv:2602.14917·cs.CL·February 17, 2026

BFS-PO: Best-First Search for Large Reasoning Models

Fiorenzo Parascandolo, Wenhui Tan, Enver Sangineto, Ruihua Song, Rita Cucchiara

PDF

Open Access

TL;DR

This paper introduces BFS-PO, a new RL algorithm that improves large reasoning models by reducing overthinking and verbosity, leading to more accurate and concise reasoning outputs.

Contribution

BFS-PO employs a Best-First Search strategy with backtracking to enhance reasoning efficiency and output conciseness in large reasoning models.

Findings

01

Increases model accuracy on various benchmarks.

02

Reduces answer length and verbosity.

03

Enhances reasoning efficiency with shorter responses.

Abstract

Large Reasoning Models (LRMs) such as OpenAI o1 and DeepSeek-R1 have shown excellent performance in reasoning tasks using long reasoning chains. However, this has also led to a significant increase of computational costs and the generation of verbose output, a phenomenon known as overthinking. The tendency to overthinking is often exacerbated by Reinforcement Learning (RL) algorithms such as GRPO/DAPO. In this paper, we propose BFS-PO, an RL algorithm which alleviates this problem using a Best-First Search exploration strategy. Specifically, BFS-PO looks for the shortest correct answer using a backtracking mechanism based on maximum entropy nodes. By generating progressively shorter responses during training, BFS-PO learns to produce concise reasoning chains. Using different benchmarks and base LRMs, we show that BFS-PO can simultaneously increase the LRM accuracy and shorten its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques