LiPO: Listwise Preference Optimization through Learning-to-Rank
Tianqi Liu, Zhen Qin, Junru Wu, Jiaming Shen, Misha Khalman, Rishabh, Joshi, Yao Zhao, Mohammad Saleh, Simon Baumgartner, Jialu Liu, Peter J. Liu,, Xuanhui Wang

TL;DR
This paper introduces LiPO, a listwise preference optimization framework for language model alignment that leverages learning-to-rank techniques, demonstrating improved performance over existing pairwise methods like DPO and SLiC.
Contribution
The paper formulates LM alignment as a listwise ranking problem and proposes LiPO-$\lambda$, a novel method that outperforms existing preference optimization approaches.
Findings
LiPO-$\lambda$ outperforms DPO variants and SLiC on preference alignment tasks.
The listwise approach effectively utilizes ranked response data for better alignment.
The study provides a thorough analysis of ranking objectives in LM preference optimization.
Abstract
Aligning language models (LMs) with curated human feedback is critical to control their behaviors in real-world applications. Several recent policy optimization methods, such as DPO and SLiC, serve as promising alternatives to the traditional Reinforcement Learning from Human Feedback (RLHF) approach. In practice, human feedback often comes in a format of a ranked list over multiple responses to amortize the cost of reading prompt. Multiple responses can also be ranked by reward models or AI feedback. There lacks such a thorough study on directly fitting upon a list of responses. In this work, we formulate the LM alignment as a \textit{listwise} ranking problem and describe the LiPO framework, where the policy can potentially learn more effectively from a ranked list of plausible responses given the prompt. This view draws an explicit connection to Learning-to-Rank (LTR), where most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Management and Algorithms
MethodsDirect Preference Optimization
