Uncertainty-aware Reward Model: Teaching Reward Models to Know What is   Unknown

Xingzhou Lou; Dong Yan; Wei Shen; Yuzi Yan; Jian Xie; Junge Zhang

arXiv:2410.00847·cs.LG·February 13, 2025

Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown

Xingzhou Lou, Dong Yan, Wei Shen, Yuzi Yan, Jian Xie, Junge Zhang

PDF

Open Access 2 Models

TL;DR

This paper introduces an uncertainty-aware reward model (URM) and its ensemble variant (URME) that better capture human preference uncertainties, leading to improved alignment and generation quality in large language models.

Contribution

The paper proposes probabilistic and ensemble-based methods to quantify both aleatoric and epistemic uncertainties in reward models, enhancing reliability and alignment in LLMs.

Findings

01

URM outperforms existing models on RewardBench.

02

Lower uncertainty correlates with higher reward prediction reliability.

03

URM and URME improve LLM generation quality across multiple evaluation methods.

Abstract

Reward models (RMs) are essential for aligning large language models (LLM) with human expectations. However, existing RMs struggle to capture the stochastic and uncertain nature of human preferences and fail to assess the reliability of reward predictions. To address these challenges, we introduce the Uncertainty-aware Reward Model (URM) and its ensemble variant, URME. URM employs a probabilistic value head to capture aleatoric uncertainty by modeling the distribution of disentangled human preference attributes. URME further quantifies epistemic uncertainty by examining discrepancies among individual URMs within the ensemble, enabling identification of unreliable evaluations. Our empirical evaluations demonstrate that URM achieves strong performance on RewardBench, outperforming competitive large-scale models. Additionally, extensive experiments, including best-of-n sampling (BoN),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Software Engineering Research · Statistical and Computational Modeling