Bridging and Modeling Correlations in Pairwise Data for Direct   Preference Optimization

Yuxin Jiang; Bo Huang; Yufei Wang; Xingshan Zeng; Liangyou Li; Yasheng; Wang; Xin Jiang; Lifeng Shang; Ruiming Tang; Wei Wang

arXiv:2408.07471·cs.CL·February 19, 2025

Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

Yuxin Jiang, Bo Huang, Yufei Wang, Xingshan Zeng, Liangyou Li, Yasheng, Wang, Xin Jiang, Lifeng Shang, Ruiming Tang, Wei Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces BMC, a framework that enhances pairwise preference data modeling for better alignment of large language models by synthesizing responses and learning token-level correlations, outperforming existing methods.

Contribution

The paper proposes a novel BMC framework that improves preference signal quality and models token-level correlations, leading to superior alignment performance over standard DPO.

Findings

01

BMC significantly outperforms DPO on QA, math, and instruction-following tasks.

02

Synthesizing pseudo-winning responses enhances preference signal consistency.

03

Modeling token-level correlations improves nuanced preference understanding.

Abstract

Direct preference optimization (DPO), a widely adopted offline preference optimization algorithm, aims to align large language models (LLMs) with human-desired behaviors using pairwise preference data. However, the generation of the winning response and the losing response within pairwise data are typically isolated, leading to weak correlations between them as well as suboptimal alignment performance. To address this issue, we propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC. Firstly, we increase the consistency and informativeness of the pairwise preference signals through targeted modifications, synthesizing a pseudo-winning response by improving the losing response with the winning response as a reference. Secondly, we identify that DPO alone is insufficient to model these correlations and capture nuanced variations. Therefore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YJiangcm/BMC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Criteria Decision Making

MethodsDirect Preference Optimization · ALIGN