Inducing Robustness in a 2 Dimensional Direct Preference Optimization Paradigm
Sarvesh Shashidhar, Ritik, Nachiketa Patil, Suraj Racha, Ganesh, Ramakrishnan

TL;DR
This paper enhances Direct Preference Optimization (DPO) for large language models by introducing a 2D scoring paradigm and robustness to label noise, improving alignment accuracy with human preferences.
Contribution
It proposes a 2D-DPO alignment method and incorporates segment-level score noise robustness, backed by theoretical analysis and empirical validation.
Findings
2D-DPO outperforms standard DPO in preference alignment.
Robustness to label noise improves the stability of DPO.
Empirical results confirm the effectiveness of the proposed noise models.
Abstract
Direct Preference Optimisation (DPO) has emerged as a powerful method for aligning Large Language Models (LLMs) with human preferences, offering a stable and efficient alternative to approaches that use Reinforcement learning via Human Feedback. In this work, we investigate the performance of DPO using open-source preference datasets. One of the major drawbacks of DPO is that it doesn't induce granular scoring and treats all the segments of the responses with equal propensity. However, this is not practically true for human preferences since even "good" responses have segments that may not be preferred by the annotator. To resolve this, a 2-dimensional scoring for DPO alignment called 2D-DPO was proposed. We explore the 2D-DPO alignment paradigm and the advantages it provides over the standard DPO by comparing their win rates. It is observed that these methods, even though effective,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Evolutionary Algorithms and Applications · Multi-Criteria Decision Making
MethodsDirect Preference Optimization
