Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization

Qiyao Ma; Dechen Gao; Rui Cai; Boqi Zhao; Hanchu Zhou; Junshan Zhang; Zhe Zhao

arXiv:2604.07343·cs.CL·April 9, 2026

Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization

Qiyao Ma, Dechen Gao, Rui Cai, Boqi Zhao, Hanchu Zhou, Junshan Zhang, Zhe Zhao

PDF

1 Repo 1 Datasets

TL;DR

Personalized RewardBench is a new benchmark designed to evaluate how well reward models capture individual user preferences in LLMs, revealing current models' limitations and their correlation with downstream performance.

Contribution

We introduce Personalized RewardBench, a novel benchmark that specifically assesses reward models' ability to model personalized preferences, addressing a key gap in existing evaluation methods.

Findings

01

Existing reward models achieve only 75.94% accuracy in personalization.

02

Personalized RewardBench correlates more strongly with downstream task performance.

03

Human evaluations confirm the preference pairs are strictly personal.

Abstract

Pluralistic alignment has emerged as a critical frontier in the development of Large Language Models (LLMs), with reward models (RMs) serving as a central mechanism for capturing diverse human values. While benchmarks for general response quality are prevalent, evaluating how well reward models account for individual user preferences remains an open challenge. To bridge this gap, we introduce Personalized RewardBench, a novel benchmark designed to rigorously assess reward models' capacity to model personalized preferences. We construct chosen and rejected response pairs based on strict adherence to (or violation of) user-specific rubrics, ensuring that preference distinctions are uniquely tailored to the individual. In particular, human evaluations confirm that the primary discriminative factor between pairs is strictly personal preference, with both responses maintaining high general…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

martin-qyma/Personalized-RewardBench
github

Datasets

QiyaoMa/Personalized-RewardBench
dataset· 330 dl
330 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.