Generalized Preference Optimization: A Unified Approach to Offline Alignment
Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello,, R\'emi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo, \'Avila Pires, Bilal Piot

TL;DR
This paper introduces generalized preference optimization (GPO), a unified framework for offline preference-based model fine-tuning that encompasses existing methods and provides new insights into regularization effects and algorithmic trade-offs.
Contribution
The paper proposes GPO, a flexible family of offline preference optimization algorithms that unify existing methods and offer new variants, enhancing understanding of regularization in offline alignment.
Findings
GPO unifies existing offline preference optimization algorithms.
Different GPO variants balance regularization and performance similarly.
The choice of convex function influences regularization effects and algorithm behavior.
Abstract
Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices. We propose generalized preference optimization (GPO), a family of offline losses parameterized by a general class of convex functions. GPO enables a unified view over preference optimization, encompassing existing algorithms such as DPO, IPO and SLiC as special cases, while naturally introducing new variants. The GPO framework also sheds light on how offline algorithms enforce regularization, through the design of the convex function that defines the loss. Our analysis and experiments reveal the connections and subtle differences between the offline regularization and the KL divergence regularization intended by the canonical RLHF formulation. In a controlled setting akin to Gao et al 2023, we also show that different GPO variants achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Multi-Criteria Decision Making
MethodsDirect Preference Optimization
