LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs

Xiaodong Wang; Jinfa Huang; Li Yuan; Peixi Peng

arXiv:2506.05260·cs.CV·June 6, 2025

LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs

Xiaodong Wang, Jinfa Huang, Li Yuan, Peixi Peng

PDF

Open Access 1 Repo

TL;DR

This paper introduces LeanPO, a novel preference optimization method for Video-LLMs that addresses likelihood displacement issues, improves alignment with human preferences, and enhances model performance with minimal overhead.

Contribution

LeanPO reformulates reward estimation for Video-LLMs, incorporating self-generated preference data and dynamic label smoothing to improve alignment and mitigate likelihood drop issues.

Findings

01

Significantly improves Video-LLM performance across various models.

02

Effectively mitigates likelihood displacement during training.

03

Enhances alignment with human trustworthiness in Video-LLMs.

Abstract

Most Video Large Language Models (Video-LLMs) adopt preference alignment techniques, e.g., DPO~\citep{rafailov2024dpo}, to optimize the reward margin between a winning response ( $y_{w}$ ) and a losing response ( $y_{l}$ ). However, the likelihood displacement observed in DPO indicates that both $lo g π_{θ} (y_{w} ∣ x)$ and $lo g π_{θ} (y_{l} ∣ x)$ often decrease during training, inadvertently boosting the probabilities of non-target responses. In this paper, we systematically revisit this phenomenon from LLMs to Video-LLMs, showing that it intensifies when dealing with the redundant complexity of video content. To alleviate the impact of this phenomenon, we propose \emph{Lean Preference Optimization} (LeanPO), a reference-free approach that reformulates the implicit reward as the average likelihood of the response with respect to the policy model. A key component of LeanPO is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wang-xiaodong1899/leanpo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsDirect Preference Optimization · Label Smoothing · ADaptive gradient method with the OPTimal convergence rate