Evaluating Reward Model Generalization via Pairwise Maximum Discrepancy Competitions
Shunyang Luo, Peibei Cao, Zhihui Zhu, Kehua Feng, Zhihua Wang, Keyan Ding

TL;DR
This paper introduces a dynamic evaluation framework called PMDC that assesses reward model generalization by actively selecting contentious prompt-response pairs, revealing systematic failures and reshuffling rankings compared to traditional benchmarks.
Contribution
We propose PMDC, a novel, annotation-efficient method for evaluating reward model generalization using active selection and large unlabeled prompt pools, improving over static benchmarks.
Findings
Substantial rank reshuffling of reward models compared to traditional benchmarks.
PMDC uncovers systematic generalization failures in reward models.
The framework efficiently identifies highly contentious cases for evaluation.
Abstract
Reward models (RMs) are central to aligning large language models, yet their practical effectiveness hinges on generalization to unseen prompts and shifting distributions. Most existing RM evaluations rely on static, pre-annotated preference datasets, which provide limited coverage and often fail to faithfully assess generalization in open-world settings. We introduce Pairwise Maximum Discrepancy Competition (PMDC), a dynamic and annotation-efficient framework for evaluating RM generalization using a large, unlabeled, open-domain prompt pool. PMDC actively selects prompt--response pairs that maximize disagreement between two RMs, yielding a compact set of highly contentious test cases. These cases are adjudicated by an oracle, and the resulting outcomes are aggregated via a Bradley--Terry model to produce a global ranking and pairwise win-rate landscape of RMs. We apply PMDC to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
