The Reward Model Selection Crisis in Personalized Alignment
Fady Rezk, Yuangang Pan, Chuan-Sheng Foo, Xun Xu, Nancy Chen, Henry Gouk, Timothy Hospedales

TL;DR
This paper reveals that standard reward model accuracy is a poor predictor of deployment performance in personalized language model alignment, and introduces new metrics and benchmarks to better evaluate behavioral alignment.
Contribution
It introduces policy accuracy as a new metric, presents the Pref-LaMP benchmark with ground-truth user completions, and demonstrates the effectiveness of in-context learning over reward-guided methods.
Findings
Reward model accuracy poorly predicts deployment success.
In-context learning outperforms reward-guided methods at larger scales.
Decoupling between ranking metrics and behavioral output quality.
Abstract
Personalized alignment from preference data has focused primarily on improving personal reward model (RM) accuracy, with the implicit assumption that better preference ranking translates to better personalized behavior. However, in deployment, computational constraints necessitate inference-time adaptation such as reward-guided decoding (RGD) rather than per-user policy fine-tuning. This creates a critical but overlooked requirement: reward models must not only rank preferences accurately but also effectively guide generation. We demonstrate that standard RM accuracy fails catastrophically as a selection criterion for deployment-ready personalized rewards. We introduce policy accuracy; a metric quantifying whether RGD-adapted LLMs correctly discriminate between preferred and dispreferred responses and show that upstream RM accuracy correlates only weakly with downstream policy accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Software System Performance and Reliability · Advanced Software Engineering Methodologies
