Reward Modeling from Natural Language Human Feedback

Zongqi Wang; Rui Wang; Yuchuan Wu; Yiyao Yu; Pinyi Zhang; Shaoning Sun; Yujiu Yang; Yongbin Li

arXiv:2601.07349·cs.CL·May 4, 2026

Reward Modeling from Natural Language Human Feedback

Zongqi Wang, Rui Wang, Yuchuan Wu, Yiyao Yu, Pinyi Zhang, Shaoning Sun, Yujiu Yang, Yongbin Li

PDF

7 Models 1 Datasets

TL;DR

This paper introduces RM-NLHF, a method that uses natural language human feedback to improve reward modeling in reinforcement learning, addressing issues with binary outcome-based rewards.

Contribution

It proposes leveraging natural language critiques and a Meta Reward Model to enhance reward accuracy and generalization in generative reward models.

Findings

01

RM-NLHF outperforms outcome-only reward models on benchmarks.

02

Natural language feedback provides more accurate reward signals.

03

MetaRM generalizes process rewards to data without human critiques.

Abstract

Reinforcement Learning with Verifiable reward (RLVR) on preference data has become the mainstream approach for training Generative Reward Models (GRMs). Typically in pairwise rewarding tasks, GRMs generate reasoning chains ending with critiques and preference labels, and RLVR then relies on the correctness of the preference labels as the training reward. However, in this paper, we demonstrate that such binary classification tasks make GRMs susceptible to guessing correct outcomes without sound critiques. Consequently, these spurious successes introduce substantial noise into the reward signal, thereby impairing the effectiveness of reinforcement learning. To address this issue, we propose Reward Modeling from Natural Language Human Feedback (RM-NLHF), which leverages natural language feedback to obtain process reward signals, thereby mitigating the problem of limited solution space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Tongyi-ConvAI/RM-NLHF
dataset· 32 dl
32 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.