Distributional Preference Alignment of LLMs via Optimal Transport
Igor Melnyk, Youssef Mroueh, Brian Belgodere, Mattia Rigotti, Apoorva, Nitsure, Mikhail Yurochkin, Kristjan Greenewald, Jiri Navratil, Jerret Ross

TL;DR
This paper introduces Alignment via Optimal Transport (AOT), a novel distributional preference alignment method for LLMs that improves alignment quality by optimizing the reward distribution dominance using a convex optimal transport formulation.
Contribution
The paper proposes a new distributional alignment method for LLMs using optimal transport, with a convex relaxation and closed-form solution, achieving state-of-the-art results.
Findings
AOT achieves state-of-the-art performance on multiple alignment datasets.
The method converges at the parametric rate based on sample complexity analysis.
Empirical results show improved alignment in 7B LLMs on benchmark evaluations.
Abstract
Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically dominant in the first order on the distribution of negative samples. We introduce a convex relaxation of this first-order stochastic dominance and cast it as an optimal transport problem with a smooth and convex cost. Thanks to the one-dimensional nature of the resulting optimal transport problem and the convexity of the cost, it has a closed-form solution via sorting on empirical measures. We fine-tune LLMs with this AOT objective, which enables alignment by penalizing the violation of the…
Peer Reviews
Decision·NeurIPS 2024 poster
The framework of the paper is very easy to follow. The proposed problem and solution's direction are interesting. Settings, theoretical results are adequate to support the proposed method.
My main concern of this paper is the empirical results in experiment section. Table 1 shows that AOT paired/unpaired do not outperform other methods at least in 4 out 7 cases (ARC, MMLU, Winogrande, GSM8K). When they are versus each other, there is no clear winner between AOT paired and AOT unpaired. Meanwhile, I believe that with the AOT paired, when we have more information, the task must be easier, please correct me if I am wrong. I have similar concerns for their performances in Figure 2.
Videos
Taxonomy
TopicsAdvanced Manufacturing and Logistics Optimization · Optimization and Packing Problems · Vehicle Routing Optimization Methods
MethodsSparse Evolutionary Training
