Adaptive Preference Scaling for Reinforcement Learning with Human   Feedback

Ilgee Hong; Zichong Li; Alexander Bukharin; Yixiao Li; Haoming Jiang,; Tianbao Yang; and Tuo Zhao

arXiv:2406.02764·cs.LG·June 6, 2024·1 cites

Adaptive Preference Scaling for Reinforcement Learning with Human Feedback

Ilgee Hong, Zichong Li, Alexander Bukharin, Yixiao Li, Haoming Jiang,, Tianbao Yang, and Tuo Zhao

PDF

Open Access 1 Video

TL;DR

This paper introduces an adaptive preference scaling method for reinforcement learning with human feedback, improving reward modeling by accounting for varying preference strengths, leading to better policy performance and easier hyperparameter tuning.

Contribution

It proposes a novel adaptive preference loss based on distributionally robust optimization that dynamically adjusts to preference ambiguity, enhancing reward function flexibility.

Findings

01

Improves policy performance in robotic control and language generation tasks.

02

Aligns reward functions more closely with policy optimization.

03

Simplifies hyperparameter tuning process.

Abstract

Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values by learning rewards from human preference data. Due to various reasons, however, such data typically takes the form of rankings over pairs of trajectory segments, which fails to capture the varying strengths of preferences across different pairs. In this paper, we propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO), designed to address this uncertainty in preference strength. By incorporating an adaptive scaling parameter into the loss for each pair, our method increases the flexibility of the reward function. Specifically, it assigns small scaling parameters to pairs with ambiguous preferences, leading to more comparable rewards, and large scaling parameters to those with clear preferences for more distinct rewards.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Adaptive Preference Scaling for Reinforcement Learning with Human Feedback· slideslive

Taxonomy

TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics

MethodsALIGN