SP^2DPO: An LLM-assisted Semantic Per-Pair DPO Generalization

Chaoyue He; Xin Zhou; Di Wang; Hong Xu; Wei Liu; Chunyan Miao

arXiv:2601.22385·cs.CL·February 2, 2026

SP^2DPO: An LLM-assisted Semantic Per-Pair DPO Generalization

Chaoyue He, Xin Zhou, Di Wang, Hong Xu, Wei Liu, Chunyan Miao

PDF

Open Access

TL;DR

SP2DPO enhances preference optimization by customizing pair-specific parameters based on semantic annotations, improving alignment with human preferences without additional training overhead.

Contribution

It introduces a novel instance-specific scheduling method for DPO, leveraging semantic annotations to better handle heterogeneous preference data.

Findings

01

SP2DPO performs competitively with global-beta DPO baselines.

02

It improves length-controlled win rate on some models.

03

Zero training overhead is incurred by the method.

Abstract

Direct Preference Optimization (DPO) controls the trade-off between fitting preference labels and staying close to a reference model using a single global temperature beta, implicitly treating all preference pairs as equally informative. Real-world preference corpora are heterogeneous: they mix high-signal, objective failures (for example, safety, factuality, instruction violations) with low-signal or subjective distinctions (for example, style), and also include label noise. We introduce our method, SP2DPO (Semantic Per-Pair DPO), a generalization that replaces the global temperature with an instance-specific schedule beta_i pre-decided offline from structured semantic-gap annotations (category, magnitude, confidence) produced by teacher language models. We instantiate this procedure on the UltraFeedback preference corpus (59,960 pairs), enabling large-scale construction of an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Machine Learning and Data Classification · Multi-Criteria Decision Making