Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback

Pangpang Liu; Junwei Lu; Will Wei Sun

arXiv:2512.03208·stat.ML·December 4, 2025

Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback

Pangpang Liu, Junwei Lu, Will Wei Sun

PDF

Open Access

TL;DR

This paper develops a statistical framework for quantifying uncertainty in reward models used in aligning large language models, addressing heterogeneity in human feedback with theoretical guarantees and practical algorithms.

Contribution

It introduces a heterogeneous preference model and an alternating gradient descent algorithm with proven convergence and asymptotic properties for reward estimation.

Findings

01

The method provides valid confidence intervals for reward estimates.

02

Uncertainty quantification improves reward comparison and policy selection.

03

Simulations and real data demonstrate practical effectiveness.

Abstract

We study estimation and statistical inference for reward models used in aligning large language models (LLMs). A key component of LLM alignment is reinforcement learning from human feedback (RLHF), where humans compare pairs of model-generated answers and their preferences are used to train a reward model. However, human feedback is inherently heterogeneous, creating significant challenges for reliable reward learning. To address this, we adopt a heterogeneous preference framework that jointly models the latent reward of answers and human rationality. This leads to a challenging biconvex optimization problem, which we solve via an alternating gradient descent algorithm. We establish theoretical guarantees for the resulting estimator, including its convergence and asymptotic distribution. These results enable the construction of confidence intervals for reward estimates. Leveraging these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques