Holistic Utility Preference Learning for Listwise Alignment

Jiacong Zhou; Xianyun Wang; Min Zhang; Jun Yu

arXiv:2410.18127·cs.IR·December 17, 2025

Holistic Utility Preference Learning for Listwise Alignment

Jiacong Zhou, Xianyun Wang, Min Zhang, Jun Yu

PDF

Open Access

TL;DR

This paper presents DRPO, a listwise learning-to-rank method using differentiable NDCG to improve alignment of language models with human preferences, outperforming pairwise approaches.

Contribution

The paper introduces DRPO, a novel listwise preference optimization method that leverages holistic list rankings and differentiable NDCG for better alignment.

Findings

01

DRPO outperforms existing pairwise methods in response quality.

02

The diffNDCG loss enables end-to-end training with NDCG.

03

Adaptive Rank Policy Score improves response discriminability.

Abstract

Aligning large language models with human preferences is essential for improving interaction quality and safety by ensuring outputs better reflect human values. A promising strategy involves Reinforcement Learning from Human Feedback (RLHF), starting with collecting and ranking responses generated by a supervised fine-tuning model to refine alignment. Existing methods such as Direct Preference Optimization (DPO) focus on pairwise comparisons, categorizing responses into preferred and less preferred pairs and optimizing pairwise margins. However, this pairwise approach cannot capture the holistic ranking relationships among multiple responses or effectively leverage the rich preference information available in list-wise comparisons. To address this challenge, this paper introduces \underline{D}irect \underline{R}anking \underline{P}reference \underline{O}ptimization (DRPO), a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms

MethodsFocus