ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models

Bin Chen; Xinzge Gao; Chuanrui Hu; Penghang Yu; Hua Zhang; Bing-Kun Bao

arXiv:2506.16712·cs.CL·June 23, 2025

ReasonGRM: Enhancing Generative Reward Models through Large Reasoning Models

Bin Chen, Xinzge Gao, Chuanrui Hu, Penghang Yu, Hua Zhang, Bing-Kun Bao

PDF

Open Access

TL;DR

ReasonGRM introduces a three-stage framework that improves generative reward models by enhancing reasoning quality, reducing hallucinations, and achieving state-of-the-art performance on benchmarks.

Contribution

It presents a novel three-stage training process incorporating reasoning path generation, a new evaluation metric, and reinforcement learning to improve reward modeling.

Findings

01

Outperforms previous GRMs by 1.8% on average

02

Surpasses proprietary models like GPT-4o by up to 5.6%

03

Demonstrates the importance of reasoning-aware training

Abstract

Generative Reward Models (GRMs) provide greater flexibility than scalar reward models in capturing human preferences, but their effectiveness is limited by poor reasoning capabilities. This often results in incomplete or overly speculative reasoning paths, leading to hallucinations or missing key information in complex tasks. We address this challenge with ReasonGRM, a three-stage generative reward modeling framework. In the first stage, Zero-RL is used to generate concise, outcome-directed reasoning paths that reduce the likelihood of critical omissions. In the second stage, we introduce a novel evaluation metric, $R^{⋆}$ , which scores reasoning paths based on their generation likelihood. This favors paths that reach correct answers with minimal exploration, helping to reduce hallucination-prone data during training. In the final stage, the model is further refined through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Games