P-Check: Advancing Personalized Reward Model via Learning to Generate Dynamic Checklist
Kwangwook Seo, Dongha Lee

TL;DR
P-Check introduces a personalized reward modeling framework that uses a dynamic checklist generator and a novel training strategy to improve reward accuracy and personalized generation, especially in out-of-distribution scenarios.
Contribution
It proposes a plug-and-play checklist generator and Preference-Contrastive Criterion Weighting to better capture dynamic human judgment nuances in personalized reward models.
Findings
Improves reward prediction accuracy.
Enhances downstream personalized generation.
Robust in out-of-distribution scenarios.
Abstract
Recent approaches in personalized reward modeling have primarily focused on leveraging user interaction history to align model judgments with individual preferences. However, existing approaches largely treat user context as a static or implicit conditioning signal, failing to capture the dynamic and multi-faceted nature of human judgment. In this paper, we propose P-Check, a novel personalized reward modeling framework, designed to train a plug-and-play checklist generator that synthesizes dynamic evaluation criteria for guiding the reward prediction. To better align these checklists with personalized nuances, we introduce Preference-Contrastive Criterion Weighting, a training strategy that assigns saliency scores to criteria based on their discriminative power for personalized judgment. We conduct extensive experiments and demonstrate that P-Check not only improves reward accuracy but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
