Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF
Nuoya Xiong, Aarti Singh

TL;DR
This paper introduces a flexible and efficient framework for multi-objective and multi-group reinforcement learning with human feedback, enabling non-linear preference aggregation without retraining.
Contribution
It transforms non-linear aggregation into linear sub-problems for efficiency and extends to multi-group scenarios, with theoretical guarantees and a nearly training-free algorithm.
Findings
Framework achieves sublinear regret.
Efficiently handles non-linear preference aggregation.
Enables multi-group consensus with minimal retraining.
Abstract
Reinforcement Learning with Human Feedback (RLHF) is a widely used fine-tuning approach that aligns machine learning model, particularly Language Model (LM) with human preferences. There are typically multiple objectives driving the preference, hence humans find it easier to express per-objective comparisons rather than a global preference between two choices. Multi-Objective RLHF (MORLHF) aims to use per-objective preference feedback and achieve Pareto optimality among these objectives by aggregating them into a single unified objective for optimization. However, nearly all prior works rely on linear aggregation, which rules out policies that favor specific objectives such as the worst one. The only existing approach using non-linear aggregation is computationally expensive due to its reward-based nature and the need for retraining whenever the aggregation parameters change. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
