WPO: Enhancing RLHF with Weighted Preference Optimization

Wenxuan Zhou; Ravi Agrawal; Shujian Zhang; Sathish Reddy Indurthi,; Sanqiang Zhao; Kaiqiang Song; Silei Xu; Chenguang Zhu

arXiv:2406.11827·cs.CL·October 7, 2024·2 cites

WPO: Enhancing RLHF with Weighted Preference Optimization

Wenxuan Zhou, Ravi Agrawal, Shujian Zhang, Sathish Reddy Indurthi,, Sanqiang Zhao, Kaiqiang Song, Silei Xu, Chenguang Zhu

PDF

Open Access 1 Repo 10 Models 4 Datasets

TL;DR

WPO introduces a reweighting strategy for off-policy preference data in RLHF, improving alignment of language models with human values by simulating on-policy learning without extra costs.

Contribution

The paper proposes Weighted Preference Optimization (WPO), a novel method that mitigates distributional gaps in off-policy RLHF by reweighting preference data to resemble on-policy data.

Findings

01

WPO outperforms DPO by up to 5.6% on Alpaca Eval 2.

02

WPO achieves a 76.7% length-controlled winning rate against GPT-4-turbo.

03

WPO enhances RLHF without additional costs.

Abstract

Reinforcement learning from human feedback (RLHF) is a promising solution to align large language models (LLMs) more closely with human values. Off-policy preference optimization, where the preference data is obtained from other models, is widely adopted due to its cost efficiency and scalability. However, off-policy preference optimization often suffers from a distributional gap between the policy used for data collection and the target policy, leading to suboptimal optimization. In this paper, we propose a novel strategy to mitigate this problem by simulating on-policy learning with off-policy preference data. Our Weighted Preference Optimization (WPO) method adapts off-policy data to resemble on-policy data more closely by reweighting preference pairs according to their probability under the current policy. This method not only addresses the distributional gap problem but also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wzhouad/wpo
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization

MethodsALIGN