RewardBench 2: Advancing Reward Model Evaluation
Saumya Malik, Valentina Pyatkin, Sander Land, Jacob Morrison, Noah A. Smith, Hannaneh Hajishirzi, Nathan Lambert

TL;DR
RewardBench 2 is a new multi-skill benchmark for reward model evaluation, providing more challenging data that better correlates with downstream task performance in language model training.
Contribution
It introduces RewardBench 2 with novel human prompts, enhancing the rigor and relevance of reward model evaluation for downstream applications.
Findings
Models score about 20 points lower on RewardBench 2 compared to RewardBench.
Performance on RewardBench 2 correlates with downstream inference and training outcomes.
Abstract
Reward models are used throughout the post-training of language models to capture nuanced signals from preference data and provide a training target for optimization across instruction following, reasoning, safety, and more domains. The community has begun establishing best practices for evaluating reward models, from the development of benchmarks that test capabilities in specific skill areas to others that test agreement with human preferences. At the same time, progress in evaluation has not been mirrored by the effectiveness of reward models in downstream tasks -- simpler direct alignment algorithms are reported to work better in many cases. This paper introduces RewardBench 2, a new multi-skill reward modeling benchmark designed to bring new, challenging data for accuracy-based reward model evaluation -- models score about 20 points on average lower on RewardBench 2 compared to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗allenai/Llama-3.1-70B-Instruct-RM-RB2model· 25 dl· ♡ 125 dl♡ 1
- 🤗allenai/Llama-3.1-8B-Instruct-RM-RB2model· 227 dl· ♡ 1227 dl♡ 1
- 🤗allenai/Llama-3.1-8B-Base-RM-RB2model· 11 dl11 dl
- 🤗allenai/Llama-3.1-Tulu-3-8B-SFT-RM-RB2model· 13 dl13 dl
- 🤗allenai/Llama-3.1-Tulu-3-8B-DPO-RM-RB2model· 13 dl13 dl
- 🤗allenai/Llama-3.1-Tulu-3-8B-RL-RM-RB2model· 27 dl27 dl
- 🤗allenai/Llama-3.1-Tulu-3-70B-SFT-RM-RB2model· 3 dl3 dl
Videos
