Reward Modeling from Natural Language Human Feedback
Zongqi Wang, Rui Wang, Yuchuan Wu, Yiyao Yu, Pinyi Zhang, Shaoning Sun, Yujiu Yang, Yongbin Li

TL;DR
This paper introduces RM-NLHF, a method that uses natural language human feedback to improve reward modeling in reinforcement learning, addressing issues with binary outcome-based rewards.
Contribution
It proposes leveraging natural language critiques and a Meta Reward Model to enhance reward accuracy and generalization in generative reward models.
Findings
RM-NLHF outperforms outcome-only reward models on benchmarks.
Natural language feedback provides more accurate reward signals.
MetaRM generalizes process rewards to data without human critiques.
Abstract
Reinforcement Learning with Verifiable reward (RLVR) on preference data has become the mainstream approach for training Generative Reward Models (GRMs). Typically in pairwise rewarding tasks, GRMs generate reasoning chains ending with critiques and preference labels, and RLVR then relies on the correctness of the preference labels as the training reward. However, in this paper, we demonstrate that such binary classification tasks make GRMs susceptible to guessing correct outcomes without sound critiques. Consequently, these spurious successes introduce substantial noise into the reward signal, thereby impairing the effectiveness of reinforcement learning. To address this issue, we propose Reward Modeling from Natural Language Human Feedback (RM-NLHF), which leverages natural language feedback to obtain process reward signals, thereby mitigating the problem of limited solution space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Tongyi-ConvAI/Baseline-Outcome-Reward-Qwen-7Bmodel· 2 dl2 dl
- 🤗Tongyi-ConvAI/Cold-Start-MetaRM-RM-NLHF-Qwen-32Bmodel· 1 dl· ♡ 11 dl♡ 1
- 🤗Tongyi-ConvAI/Cold-Start-MetaRM-RM-NLHF-Qwen-7Bmodel· 1 dl1 dl
- 🤗Tongyi-ConvAI/Final-MetaRM-RM-NLHF-Qwen-32Bmodel· 2 dl2 dl
- 🤗Tongyi-ConvAI/Final-MetaRM-RM-NLHF-Qwen-7Bmodel· 4 dl4 dl
- 🤗Tongyi-ConvAI/RM-NLHF-Qwen-32Bmodel· 4 dl4 dl
- 🤗Tongyi-ConvAI/RM-NLHF-Qwen-7Bmodel· 2 dl· ♡ 22 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
