K-order Ranking Preference Optimization for Large Language Models
Shihao Cai, Chongming Gao, Yang Zhang, Wentao Shi, Jizhi Zhang, Keqin Bao, Qifan Wang, Fuli Feng

TL;DR
This paper introduces K-order Ranking Preference Optimization (KPO), a novel method focusing on optimizing top-K ranking accuracy for large language models, improving relevance and robustness in real-world applications.
Contribution
The paper extends the DPO model to prioritize top-K rankings, dynamically adjusts K per query, and employs curriculum learning for efficient training, addressing practical ranking needs.
Findings
KPO outperforms existing methods in ranking accuracy.
KPO demonstrates high sample efficiency and robustness to noise.
Dynamic K adjustment improves ranking relevance.
Abstract
To adapt large language models (LLMs) to ranking tasks, existing list-wise methods, represented by list-wise Direct Preference Optimization (DPO), focus on optimizing partial-order or full-order list ranking consistency for LLMs to enhance their ranking abilities. However, we argue that optimizing top-K ranking consistency could be more appropriate for real-world applications. There are two main reasons: (1) users are typically concerned with only the top-K results, making top-K ranking more important, and (2) tail items often lack precise feedback, making top-K ranking more reliable. Based on this, we propose K-order Ranking Preference Optimization (KPO) by extending the DPO's Plackett-Luce model to accommodate top-K rankings. Additionally, recognizing that the number of important items can vary across queries, we extend KPO to dynamically determine appropriate K for different samples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic · Data Management and Algorithms
