$f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization
Jiaqi Han, Mingjian Jiang, Yuxuan Song, Stefano Ermon, Minkai Xu

TL;DR
This paper introduces $f$-PO, a unified framework for preference optimization that generalizes existing methods using $f$-divergences, demonstrating improved performance on language model alignment benchmarks.
Contribution
The paper proposes $f$-PO, a novel, theoretically grounded framework that unifies and extends preference optimization methods with various divergence choices.
Findings
$f$-PO outperforms existing methods on multiple benchmarks.
Different $f$-divergences impact regularization and performance trade-offs.
Theoretical analysis clarifies properties of $f$-PO.
Abstract
Preference optimization has made significant progress recently, with numerous methods developed to align language models with human preferences. This paper introduces -divergence Preference Optimization (-PO), a novel framework that generalizes and extends existing approaches. -PO minimizes -divergences between the optimized policy and the optimal policy, encompassing a broad family of alignment methods using various divergences. Our approach unifies previous algorithms like DPO and EXO, while offering new variants through different choices of -divergences. We provide theoretical analysis of -PO's properties and conduct extensive experiments on state-of-the-art language models using benchmark datasets. Results demonstrate -PO's effectiveness across various tasks, achieving superior performance compared to existing methods on popular benchmarks such as AlpacaEval 2,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Criteria Decision Making
MethodsALIGN · Direct Preference Optimization
