Optimal Transport for LLM Reward Modeling from Noisy Preference

Licheng Pan; Haochen Yang; Haoxuan Li; Yunsheng Lu; Yongqi Tong; Yinuo Wang; Shijian Wang; Zhixuan Chu; Lei Shen; Yuan Lu; Hao Wang

arXiv:2605.06036·cs.LG·May 8, 2026

Optimal Transport for LLM Reward Modeling from Noisy Preference

Licheng Pan, Haochen Yang, Haoxuan Li, Yunsheng Lu, Yongqi Tong, Yinuo Wang, Shijian Wang, Zhixuan Chu, Lei Shen, Yuan Lu, Hao Wang

PDF

TL;DR

This paper introduces SelectiveRM, a novel optimal transport-based framework for reward modeling in RLHF that effectively handles noisy preferences by aligning distributions and selectively excluding outliers.

Contribution

The paper presents a new optimal transport framework with a joint consistency discrepancy and mass relaxation, improving reward modeling from noisy preference data.

Findings

01

SelectiveRM outperforms state-of-the-art baselines on multiple benchmarks.

02

Theoretical analysis shows tighter bounds on unobserved clean risk.

03

The approach effectively excludes noisy outliers while aligning model predictions with preferences.

Abstract

Reward models are fundamental to Reinforcement Learning from Human Feedback (RLHF), yet real-world datasets are inevitably corrupted by noisy preference. Conventional training objectives tend to overfit these errors, while existing denoising approaches often rely on homogeneous noise assumptions that fail to capture the complexity of linguistic preferences. To handle these challenges, we propose SelectiveRM, a framework grounded in optimal transport. We first devise a Joint Consistency Discrepancy to align the distribution of model predictions with preference data. Furthermore, to address the limitation of strict mass conservation which compels the model to fit outliers, we incorporate a Mass Relaxation mechanism via partial transport. This enables the autonomous exclusion of samples with noisy preference that contradict semantic consistency. Theoretically, we demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.