wDPO: Winsorized Direct Preference Optimization for Robust LLM Alignment

Jilong Liu; Yonghui Yang; Pengyang Shao; Haokai Ma; Wei Qin; Richang Hong

arXiv:2603.07211·cs.LG·March 10, 2026

wDPO: Winsorized Direct Preference Optimization for Robust LLM Alignment

Jilong Liu, Yonghui Yang, Pengyang Shao, Haokai Ma, Wei Qin, Richang Hong

PDF

Open Access

TL;DR

wDPO introduces a hierarchical winsorization method to improve the robustness of large language model alignment by effectively handling different types of noisy preference data during training.

Contribution

The paper proposes wDPO, a novel hierarchical winsorization approach that targets specific noise types in preference data, enhancing robustness over existing DPO variants.

Findings

01

wDPO outperforms vanilla DPO and baselines on safety benchmarks.

02

wDPO shows significant robustness under label-flip noise.

03

Hierarchical interventions improve preference alignment quality.

Abstract

Direct Preference Optimization (DPO) aligns large language models by optimizing pairwise preferences and has shown remarkable effectiveness as a simple and scalable alternative to RLHF. However, in practice, preference data are often noisy. Existing robust variants of DPO mainly rely on uniform objective modifications or global reweighting. While partially effective, these methods treat noisy samples as a homogeneous source of uncertainty and fail to distinguish between different noise types, leading to sub-optimal alignment robustness. In this work, we show that robust preference alignment benefits from addressing different noise types with targeted interventions rather than uniform regularization. We propose winsorized Direct Preference Optimization~(wDPO), a robust LLM alignment approach with hierarchical winsorization. Specifically, wDPO adopts a reward-free hierarchical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Constraint Satisfaction and Optimization · Advanced Multi-Objective Optimization Algorithms