Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng, Yunjia Qi, Xiaozhi Wang, Zijun Yao, Bin Xu, Lei Hou, Juanzi, Li

TL;DR
This paper introduces agentic reward modeling, which combines human preferences with verifiable correctness signals like factuality and instruction following to create more reliable reward systems for training large language models, showing significant improvements.
Contribution
The paper proposes a novel reward system that integrates human preferences with verifiable correctness signals, enhancing reward reliability for LLM training.
Findings
RewardAgent outperforms vanilla reward models in benchmarks.
Training LLMs with RewardAgent improves NLP benchmark scores.
Combining signals leads to more trustworthy reward signals.
Abstract
Reward models (RMs) are crucial for the training and inference-time scaling up of large language models (LLMs). However, existing reward models primarily focus on human preferences, neglecting verifiable correctness signals which have shown strong potential in training LLMs. In this paper, we propose agentic reward modeling, a reward system that combines reward models with verifiable correctness signals from different aspects to provide reliable rewards. We empirically implement a reward agent, named RewardAgent, that combines human preference rewards with two verifiable signals: factuality and instruction following, to provide more reliable rewards. We conduct comprehensive experiments on existing reward model benchmarks and inference time best-of-n searches on real-world downstream tasks. RewardAgent significantly outperforms vanilla reward models, demonstrating its effectiveness. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Recommender Systems and Techniques
MethodsDirect Preference Optimization · Focus
