Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
Qiyao Ma, Dechen Gao, Rui Cai, Boqi Zhao, Hanchu Zhou, Junshan Zhang, Zhe Zhao

TL;DR
Personalized RewardBench is a new benchmark designed to evaluate how well reward models capture individual user preferences in LLMs, revealing current models' limitations and their correlation with downstream performance.
Contribution
We introduce Personalized RewardBench, a novel benchmark that specifically assesses reward models' ability to model personalized preferences, addressing a key gap in existing evaluation methods.
Findings
Existing reward models achieve only 75.94% accuracy in personalization.
Personalized RewardBench correlates more strongly with downstream task performance.
Human evaluations confirm the preference pairs are strictly personal.
Abstract
Pluralistic alignment has emerged as a critical frontier in the development of Large Language Models (LLMs), with reward models (RMs) serving as a central mechanism for capturing diverse human values. While benchmarks for general response quality are prevalent, evaluating how well reward models account for individual user preferences remains an open challenge. To bridge this gap, we introduce Personalized RewardBench, a novel benchmark designed to rigorously assess reward models' capacity to model personalized preferences. We construct chosen and rejected response pairs based on strict adherence to (or violation of) user-specific rubrics, ensuring that preference distinctions are uniquely tailored to the individual. In particular, human evaluations confirm that the primary discriminative factor between pairs is strictly personal preference, with both responses maintaining high general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
