SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment

Wenqiao Zhu; Ji Liu; Lulu Wang; Jun Wu; Yulun Zhang

arXiv:2505.12435·cs.LG·May 20, 2025

SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment

Wenqiao Zhu, Ji Liu, Lulu Wang, Jun Wu, Yulun Zhang

PDF

Open Access 1 Video

TL;DR

SGDPO introduces a self-guided optimization method for language model alignment, enhancing response quality and robustness by controlling reward updates, supported by theoretical analysis and extensive experiments showing significant improvements.

Contribution

The paper proposes SGDPO, a novel self-guided optimization algorithm that improves DPO's effectiveness and resilience in aligning language models with human preferences.

Findings

01

Up to 9.19% higher scores on benchmarks.

02

Theoretical analysis confirms the operational mechanism.

03

Experimental results demonstrate improved alignment performance.

Abstract

Direct Preference Optimization (DPO) is broadly utilized for aligning Large Language Models (LLMs) with human values because of its flexibility. Despite its effectiveness, it has been observed that the capability of DPO to generate human-preferred response is limited and the results of DPO are far from resilient. To address these limitations, in this paper we propose a novel Self-Guided Direct Preference Optimization algorithm, i.e., SGDPO, which incorporates a pilot term to steer the gradient flow during the optimization process, allowing for fine-grained control over the updates of chosen and rejected rewards. We provide a detailed theoretical analysis of our proposed method and elucidate its operational mechanism. Furthermore, we conduct comprehensive experiments on various models and benchmarks. The extensive experimental results demonstrate the consistency between the empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment· underline

Taxonomy

TopicsMachine Learning and Data Classification · Topic Modeling · Recommender Systems and Techniques

MethodsDirect Preference Optimization