Group Robust Preference Optimization in Reward-free RLHF

Shyam Sundhar Ramesh; Yifan Hu; Iason Chaimalas; Viraj Mehta; Pier; Giuseppe Sessa; Haitham Bou Ammar; Ilija Bogunovic

arXiv:2405.20304·cs.CL·May 31, 2024

Group Robust Preference Optimization in Reward-free RLHF

Shyam Sundhar Ramesh, Yifan Hu, Iason Chaimalas, Viraj Mehta, Pier, Giuseppe Sessa, Haitham Bou Ammar, Ilija Bogunovic

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces GRPO, a method for fine-tuning large language models to perform robustly across diverse preference groups by optimizing for worst-case group performance, improving fairness and accuracy.

Contribution

The paper proposes a novel Group Robust Preference Optimization method that enhances LLM alignment to individual group preferences, addressing robustness issues in traditional RLHF.

Findings

01

Improved worst-group performance in LLM fine-tuning.

02

Reduced loss imbalance across diverse preference groups.

03

Enhanced probability accuracy compared to baseline methods.

Abstract

Adapting large language models (LLMs) for specific tasks usually involves fine-tuning through reinforcement learning with human feedback (RLHF) on preference data. While these data often come from diverse labelers' groups (e.g., different demographics, ethnicities, company teams, etc.), traditional RLHF approaches adopt a "one-size-fits-all" approach, i.e., they indiscriminately assume and optimize a single preference model, thus not being robust to unique characteristics and needs of the various groups. To address this limitation, we propose a novel Group Robust Preference Optimization (GRPO) method to align LLMs to individual groups' preferences robustly. Our approach builds upon reward-free direct preference optimization methods, but unlike previous approaches, it seeks a robust policy which maximizes the worst-case group performance. To achieve this, GRPO adaptively and sequentially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Group Robust Preference Optimization in Reward-free RLHF· slideslive

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms

MethodsALIGN