Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision
Tej Deep Pala, Panshul Sharma, Amir Zadeh, Chuan Li, Soujanya Poria

TL;DR
This paper introduces PathFinder-PRM, an error-aware hierarchical process reward model that improves mathematical reasoning by classifying errors at each step, leading to better reward estimation and data efficiency.
Contribution
The paper presents a novel hierarchical, error-aware discriminative PRM that classifies errors at each step and combines signals for improved reward modeling in mathematical reasoning.
Findings
PathFinder-PRM achieves a new state-of-the-art PRMScore of 67.7 on PRMBench.
The model outperforms previous methods while using three times less training data.
Reward-guided greedy search with PathFinder-PRM improves prm@8 by 1.5 points.
Abstract
Large Language Models (LLMs) are prone to hallucination, especially during multi-hop and reasoning-intensive tasks such as mathematical problem solving. While Outcome Reward Models verify only final answers, Process Reward Models (PRMs) score each intermediate step to steer generation toward coherent solutions. We introduce PathFinder-PRM, a novel hierarchical, error-aware discriminative PRM that first classifies math and consistency errors at each step, then combines these fine-grained signals to estimate step correctness. To train PathFinder-PRM, we construct a 400K-sample dataset by enriching the human-annotated PRM800K corpus and RLHFlow Mistral traces with three-dimensional step-level labels. On PRMBench, PathFinder-PRM achieves a new state-of-the-art PRMScore of 67.7, outperforming the prior best (65.5) while using 3 times less data. When applied to reward guided greedy search,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBusiness Process Modeling and Analysis
