Lightweight Robust Direct Preference Optimization
Cheol Woo Kim, Shresth Verma, Mauricio Tec, Milind Tambe

TL;DR
This paper introduces DPO-PRO, a lightweight, robust fine-tuning method for large language models that enhances resistance to noisy preference data by focusing on preference uncertainty with minimal computational cost.
Contribution
The paper proposes DPO-PRO, a novel preference robustness approach that improves DPO's robustness to noise without high computational overhead, by focusing on preference uncertainty.
Findings
DPO-PRO outperforms existing DPO variants on standard benchmarks.
It effectively reduces overfitting caused by noisy preference signals.
The method incurs negligible additional computational cost.
Abstract
Direct Preference Optimization (DPO) has become a popular method for fine-tuning large language models (LLMs) due to its stability and simplicity. However, it is also known to be sensitive to noise in the data and prone to overfitting. Recent works have proposed using distributionally robust optimization (DRO) to address potential noise and distributional shift in the data. However, these methods often suffer from excessive conservatism and high computational cost. We propose DPO-PRO (DPO with Preference Robustness), a robust fine-tuning algorithm based on DPO which accounts for uncertainty in the preference distribution through a lightweight DRO formulation. Unlike prior DRO-based variants, DPO-PRO focuses solely on uncertainty in preferences, avoiding unnecessary conservatism and incurring negligible computational overhead. We further show that DPO-PRO is equivalent to a regularized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
