Distributional Preference Alignment of LLMs via Optimal Transport

Igor Melnyk; Youssef Mroueh; Brian Belgodere; Mattia Rigotti; Apoorva; Nitsure; Mikhail Yurochkin; Kristjan Greenewald; Jiri Navratil; Jerret Ross

arXiv:2406.05882·cs.LG·June 11, 2024·2 cites

Distributional Preference Alignment of LLMs via Optimal Transport

Igor Melnyk, Youssef Mroueh, Brian Belgodere, Mattia Rigotti, Apoorva, Nitsure, Mikhail Yurochkin, Kristjan Greenewald, Jiri Navratil, Jerret Ross

PDF

Open Access 1 Video 1 Reviews

TL;DR

This paper introduces Alignment via Optimal Transport (AOT), a novel distributional preference alignment method for LLMs that improves alignment quality by optimizing the reward distribution dominance using a convex optimal transport formulation.

Contribution

The paper proposes a new distributional alignment method for LLMs using optimal transport, with a convex relaxation and closed-form solution, achieving state-of-the-art results.

Findings

01

AOT achieves state-of-the-art performance on multiple alignment datasets.

02

The method converges at the parametric rate based on sample complexity analysis.

03

Empirical results show improved alignment in 7B LLMs on benchmark evaluations.

Abstract

Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically dominant in the first order on the distribution of negative samples. We introduce a convex relaxation of this first-order stochastic dominance and cast it as an optimal transport problem with a smooth and convex cost. Thanks to the one-dimensional nature of the resulting optimal transport problem and the convexity of the cost, it has a closed-form solution via sorting on empirical measures. We fine-tune LLMs with this AOT objective, which enables alignment by penalizing the violation of the…

Peer Reviews

Decision·NeurIPS 2024 poster

Reviewer 01Rating 4Confidence 2

Strengths

The framework of the paper is very easy to follow. The proposed problem and solution's direction are interesting. Settings, theoretical results are adequate to support the proposed method.

Weaknesses

My main concern of this paper is the empirical results in experiment section. Table 1 shows that AOT paired/unpaired do not outperform other methods at least in 4 out 7 cases (ARC, MMLU, Winogrande, GSM8K). When they are versus each other, there is no clear winner between AOT paired and AOT unpaired. Meanwhile, I believe that with the AOT paired, when we have more information, the task must be easier, please correct me if I am wrong. I have similar concerns for their performances in Figure 2.

Videos

Distributional Preference Alignment of LLMs via Optimal Transport· slideslive

Taxonomy

TopicsAdvanced Manufacturing and Logistics Optimization · Optimization and Packing Problems · Vehicle Routing Optimization Methods

MethodsSparse Evolutionary Training