Mind the Gap: Structure-Aware Consistency in Preference Learning

Mehryar Mohri; Yutao Zhong

arXiv:2604.27733·cs.LG·May 1, 2026

Mind the Gap: Structure-Aware Consistency in Preference Learning

Mehryar Mohri, Yutao Zhong

PDF

TL;DR

This paper addresses the inconsistency of standard surrogate losses in preference learning for LLM alignment, proposing a structure-aware framework with theoretical guarantees and a novel objective to improve consistency.

Contribution

It introduces a margin-shifted ranking framework with structure-aware consistency bounds and a new objective (SA-DPO) that adapts margins based on semantic distances.

Findings

01

Standard surrogates are inconsistent for neural network hypothesis sets.

02

The proposed SA-DPO adapts margins to semantic distances, improving alignment.

03

Heavy-tailed surrogates outperform logistic loss in capacity-bounded models.

Abstract

Preference learning has become the foundation of aligning Large Language Models (LLMs) with human intent. Popular methods, such as Direct Preference Optimization (DPO), minimize surrogate losses as proxies for the intractable pairwise ranking loss. However, we demonstrate that for the equicontinuous hypothesis sets typical of neural networks, these standard surrogates are theoretically inconsistent, yielding vacuous generalization guarantees. To resolve this, we formulate LLM alignment within a margin-shifted ranking framework. We derive rigorous $H$ -consistency bounds that depend on enforcing a separation margin $γ$ . Crucially, we extend this to Structure-Aware $H$ -consistency, introducing a novel objective (SA-DPO) that adapts the margin based on the semantic distance between responses to handle synonyms and hard pairs. Finally, we analyze the trade-off between consistency and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.