Inducing Robustness in a 2 Dimensional Direct Preference Optimization   Paradigm

Sarvesh Shashidhar; Ritik; Nachiketa Patil; Suraj Racha; Ganesh; Ramakrishnan

arXiv:2505.01706·cs.AI·May 6, 2025

Inducing Robustness in a 2 Dimensional Direct Preference Optimization Paradigm

Sarvesh Shashidhar, Ritik, Nachiketa Patil, Suraj Racha, Ganesh, Ramakrishnan

PDF

Open Access

TL;DR

This paper enhances Direct Preference Optimization (DPO) for large language models by introducing a 2D scoring paradigm and robustness to label noise, improving alignment accuracy with human preferences.

Contribution

It proposes a 2D-DPO alignment method and incorporates segment-level score noise robustness, backed by theoretical analysis and empirical validation.

Findings

01

2D-DPO outperforms standard DPO in preference alignment.

02

Robustness to label noise improves the stability of DPO.

03

Empirical results confirm the effectiveness of the proposed noise models.

Abstract

Direct Preference Optimisation (DPO) has emerged as a powerful method for aligning Large Language Models (LLMs) with human preferences, offering a stable and efficient alternative to approaches that use Reinforcement learning via Human Feedback. In this work, we investigate the performance of DPO using open-source preference datasets. One of the major drawbacks of DPO is that it doesn't induce granular scoring and treats all the segments of the responses with equal propensity. However, this is not practically true for human preferences since even "good" responses have segments that may not be preferred by the annotator. To resolve this, a 2-dimensional scoring for DPO alignment called 2D-DPO was proposed. We explore the 2D-DPO alignment paradigm and the advantages it provides over the standard DPO by comparing their win rates. It is observed that these methods, even though effective,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Evolutionary Algorithms and Applications · Multi-Criteria Decision Making

MethodsDirect Preference Optimization