FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings

Tong Liu; Xiao Yu; Wenxuan Zhou; Jindong Gu; Volker Tresp

arXiv:2501.06645·cs.CL·July 29, 2025

FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings

Tong Liu, Xiao Yu, Wenxuan Zhou, Jindong Gu, Volker Tresp

PDF

1 Video

TL;DR

FocalPO is a novel preference optimization method that improves alignment of language models by focusing on correctly ranked preference pairs, outperforming existing methods on standard benchmarks.

Contribution

We introduce FocalPO, a DPO variant inspired by Focal Loss, which down-weights misranked pairs to better enhance model understanding of correctly ranked preferences.

Findings

01

FocalPO outperforms DPO on Alpaca Eval 2.0 benchmarks.

02

FocalPO effectively balances training on correct and incorrect preference pairs.

03

The hyperparameter for FocalPO remains fixed across experiments.

Abstract

Efficient preference optimization algorithms such as Direct Preference Optimization (DPO) have become a popular approach in aligning large language models (LLMs) with human preferences. These algorithms implicitly treat the LLM as a reward model, and focus on training it to correct misranked preference pairs. However, recent work~\citep{chen2024preference} empirically finds that DPO training \textit{rarely improves these misranked preference pairs}, despite its gradient emphasizing on these cases. We introduce FocalPO, a DPO variant that instead \textit{down-weighs} misranked preference pairs and prioritizes enhancing the model's understanding of pairs that it can already rank correctly. Inspired by Focal Loss used in vision tasks, FocalPO achieves this by adding a modulating factor to dynamically scale DPO loss. Our experiment demonstrates that FocalPO surpasses DPO and its variants on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings· underline

Taxonomy

MethodsFocal Loss · Direct Preference Optimization · Focus