Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision

Tej Deep Pala; Panshul Sharma; Amir Zadeh; Chuan Li; Soujanya Poria

arXiv:2505.19706·cs.CL·May 27, 2025

Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision

Tej Deep Pala, Panshul Sharma, Amir Zadeh, Chuan Li, Soujanya Poria

PDF

Open Access 1 Repo 1 Models 1 Datasets 1 Video

TL;DR

This paper introduces PathFinder-PRM, an error-aware hierarchical process reward model that improves mathematical reasoning by classifying errors at each step, leading to better reward estimation and data efficiency.

Contribution

The paper presents a novel hierarchical, error-aware discriminative PRM that classifies errors at each step and combines signals for improved reward modeling in mathematical reasoning.

Findings

01

PathFinder-PRM achieves a new state-of-the-art PRMScore of 67.7 on PRMBench.

02

The model outperforms previous methods while using three times less training data.

03

Reward-guided greedy search with PathFinder-PRM improves prm@8 by 1.5 points.

Abstract

Large Language Models (LLMs) are prone to hallucination, especially during multi-hop and reasoning-intensive tasks such as mathematical problem solving. While Outcome Reward Models verify only final answers, Process Reward Models (PRMs) score each intermediate step to steer generation toward coherent solutions. We introduce PathFinder-PRM, a novel hierarchical, error-aware discriminative PRM that first classifies math and consistency errors at each step, then combines these fine-grained signals to estimate step correctness. To train PathFinder-PRM, we construct a 400K-sample dataset by enriching the human-annotated PRM800K corpus and RLHFlow Mistral traces with three-dimensional step-level labels. On PRMBench, PathFinder-PRM achieves a new state-of-the-art PRMScore of 67.7, outperforming the prior best (65.5) while using 3 times less data. When applied to reward guided greedy search,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

declare-lab/pathfinder-prm
pytorchOfficial

Models

🤗
declare-lab/PathFinder-PRM-7B
model· 34 dl· ♡ 5
34 dl♡ 5

Datasets

declare-lab/PathFinder-600K
dataset· 116 dl
116 dl

Videos

Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision· underline

Taxonomy

TopicsBusiness Process Modeling and Analysis