Projection Optimization: A General Framework for Multi-Objective and   Multi-Group RLHF

Nuoya Xiong; Aarti Singh

arXiv:2502.15145·cs.LG·February 25, 2025

Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF

Nuoya Xiong, Aarti Singh

PDF

TL;DR

This paper introduces a flexible and efficient framework for multi-objective and multi-group reinforcement learning with human feedback, enabling non-linear preference aggregation without retraining.

Contribution

It transforms non-linear aggregation into linear sub-problems for efficiency and extends to multi-group scenarios, with theoretical guarantees and a nearly training-free algorithm.

Findings

01

Framework achieves sublinear regret.

02

Efficiently handles non-linear preference aggregation.

03

Enables multi-group consensus with minimal retraining.

Abstract

Reinforcement Learning with Human Feedback (RLHF) is a widely used fine-tuning approach that aligns machine learning model, particularly Language Model (LM) with human preferences. There are typically multiple objectives driving the preference, hence humans find it easier to express per-objective comparisons rather than a global preference between two choices. Multi-Objective RLHF (MORLHF) aims to use per-objective preference feedback and achieve Pareto optimality among these objectives by aggregating them into a single unified objective for optimization. However, nearly all prior works rely on linear aggregation, which rules out policies that favor specific objectives such as the worst one. The only existing approach using non-linear aggregation is computationally expensive due to its reward-based nature and the need for retraining whenever the aggregation parameters change. In this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.