TL;DR
FocalPO is a novel preference optimization method that improves alignment of language models by focusing on correctly ranked preference pairs, outperforming existing methods on standard benchmarks.
Contribution
We introduce FocalPO, a DPO variant inspired by Focal Loss, which down-weights misranked pairs to better enhance model understanding of correctly ranked preferences.
Findings
FocalPO outperforms DPO on Alpaca Eval 2.0 benchmarks.
FocalPO effectively balances training on correct and incorrect preference pairs.
The hyperparameter for FocalPO remains fixed across experiments.
Abstract
Efficient preference optimization algorithms such as Direct Preference Optimization (DPO) have become a popular approach in aligning large language models (LLMs) with human preferences. These algorithms implicitly treat the LLM as a reward model, and focus on training it to correct misranked preference pairs. However, recent work~\citep{chen2024preference} empirically finds that DPO training \textit{rarely improves these misranked preference pairs}, despite its gradient emphasizing on these cases. We introduce FocalPO, a DPO variant that instead \textit{down-weighs} misranked preference pairs and prioritizes enhancing the model's understanding of pairs that it can already rank correctly. Inspired by Focal Loss used in vision tasks, FocalPO achieves this by adding a modulating factor to dynamically scale DPO loss. Our experiment demonstrates that FocalPO surpasses DPO and its variants on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsFocal Loss · Direct Preference Optimization · Focus
