Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Xingzhou Lou, Dong Yan, Wei Shen, Yuzi Yan, Jian Xie, Junge Zhang

TL;DR
This paper introduces an uncertainty-aware reward model (URM) and its ensemble variant (URME) that better capture human preference uncertainties, leading to improved alignment and generation quality in large language models.
Contribution
The paper proposes probabilistic and ensemble-based methods to quantify both aleatoric and epistemic uncertainties in reward models, enhancing reliability and alignment in LLMs.
Findings
URM outperforms existing models on RewardBench.
Lower uncertainty correlates with higher reward prediction reliability.
URM and URME improve LLM generation quality across multiple evaluation methods.
Abstract
Reward models (RMs) are essential for aligning large language models (LLM) with human expectations. However, existing RMs struggle to capture the stochastic and uncertain nature of human preferences and fail to assess the reliability of reward predictions. To address these challenges, we introduce the Uncertainty-aware Reward Model (URM) and its ensemble variant, URME. URM employs a probabilistic value head to capture aleatoric uncertainty by modeling the distribution of disentangled human preference attributes. URME further quantifies epistemic uncertainty by examining discrepancies among individual URMs within the ensemble, enabling identification of unreliable evaluations. Our empirical evaluations demonstrate that URM achieves strong performance on RewardBench, outperforming competitive large-scale models. Additionally, extensive experiments, including best-of-n sampling (BoN),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Software Engineering Research · Statistical and Computational Modeling
