The Bidirectional Process Reward Model

Lingyin Zhang; Jun Gao; Xiaoxue Ren; Ziqiang Cao

arXiv:2508.01682·cs.CL·January 7, 2026

The Bidirectional Process Reward Model

Lingyin Zhang, Jun Gao, Xiaoxue Ren, Ziqiang Cao

PDF

Open Access

TL;DR

The paper introduces BiPRM, a bidirectional process reward model that evaluates reasoning steps in both directions to improve the accuracy and robustness of reward assessments in large language models.

Contribution

It proposes a novel bidirectional evaluation paradigm with a simple gating mechanism, significantly enhancing process reward modeling with minimal additional parameters and latency.

Findings

01

BiPRM outperforms unidirectional models across multiple benchmarks.

02

Achieves an average 10.6% relative gain in solution quality.

03

Demonstrates robustness and broad applicability in diverse settings.

Abstract

Process Reward Models (PRMs), which assign fine-grained scores to intermediate reasoning steps within a solution trajectory, have emerged as a promising approach to enhance the reasoning quality of Large Language Models (LLMs). However, most existing PRMs rely on a unidirectional left-to-right (L2R) evaluation scheme, which restricts their utilization of global context. In light of this challenge, we propose a novel bidirectional evaluation paradigm, named Bidirectional Process Reward Model (BiPRM). BiPRM incorporates a parallel right-to-left (R2L) evaluation stream, implemented via prompt reversal, alongside the conventional L2R flow. Then a gating mechanism is introduced to adaptively fuse the reward scores from both streams to yield a holistic quality assessment. Remarkably, compared to the original PRM, BiPRM introduces only a 0.3% parameter increase for the gating module, and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education