Geometric-Averaged Preference Optimization for Soft Preference Labels
Hiroki Furuta, Kuang-Huei Lee, Shixiang Shane Gu, Yutaka Matsuo,, Aleksandra Faust, Heiga Zen, Izzeddin Gur

TL;DR
This paper introduces a geometric-averaged preference optimization method that incorporates distributional human preferences into LLM alignment, improving performance and reducing over-optimization issues.
Contribution
It proposes a novel weighted geometric average approach for soft preference labels in DPO, enhancing alignment and mitigating prior limitations.
Findings
Improved alignment performance on standard benchmarks.
More preferable responses with soft preference labels.
Significant gains with modestly-confident labels.
Abstract
Many algorithms for aligning LLMs with human preferences assume that human preferences are binary and deterministic. However, human preferences can vary across individuals, and therefore should be represented distributionally. In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function. This approach adjusts the scale of learning loss based on the soft labels such that the loss would approach zero when the responses are closer to equally preferred. This simple modification can be easily applied to any DPO-based methods and mitigate over-optimization and objective mismatch, which prior works suffer from. Our experiments simulate the soft preference labels with AI feedback from LLMs and demonstrate that geometric averaging consistently improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMulti-Criteria Decision Making
MethodsDirect Preference Optimization
