RewardBench 2: Advancing Reward Model Evaluation

Saumya Malik; Valentina Pyatkin; Sander Land; Jacob Morrison; Noah A. Smith; Hannaneh Hajishirzi; Nathan Lambert

arXiv:2506.01937·cs.CL·April 24, 2026

RewardBench 2: Advancing Reward Model Evaluation

Saumya Malik, Valentina Pyatkin, Sander Land, Jacob Morrison, Noah A. Smith, Hannaneh Hajishirzi, Nathan Lambert

PDF

7 Models 1 Datasets 1 Video

TL;DR

RewardBench 2 is a new multi-skill benchmark for reward model evaluation, providing more challenging data that better correlates with downstream task performance in language model training.

Contribution

It introduces RewardBench 2 with novel human prompts, enhancing the rigor and relevance of reward model evaluation for downstream applications.

Findings

01

Models score about 20 points lower on RewardBench 2 compared to RewardBench.

02

Performance on RewardBench 2 correlates with downstream inference and training outcomes.

Abstract

Reward models are used throughout the post-training of language models to capture nuanced signals from preference data and provide a training target for optimization across instruction following, reasoning, safety, and more domains. The community has begun establishing best practices for evaluating reward models, from the development of benchmarks that test capabilities in specific skill areas to others that test agreement with human preferences. At the same time, progress in evaluation has not been mirrored by the effectiveness of reward models in downstream tasks -- simpler direct alignment algorithms are reported to work better in many cases. This paper introduces RewardBench 2, a new multi-skill reward modeling benchmark designed to bring new, challenging data for accuracy-based reward model evaluation -- models score about 20 points on average lower on RewardBench 2 compared to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

allenai/reward-bench-2
dataset· 5.7k dl
5.7k dl

Videos

RewardBench 2: Advancing Reward Model Evaluation· slideslive