LPOI: Listwise Preference Optimization for Vision Language Models

Fatemeh Pesaran Zadeh; Yoojin Oh; Gunhee Kim

arXiv:2505.21061·cs.CV·May 28, 2025

LPOI: Listwise Preference Optimization for Vision Language Models

Fatemeh Pesaran Zadeh, Yoojin Oh, Gunhee Kim

PDF

Open Access 1 Repo 1 Video

TL;DR

LPOI introduces a novel listwise preference optimization method for vision-language models that reduces hallucinations by automatically constructing ranked image lists through object masking and interpolation, improving alignment with human preferences.

Contribution

This work is the first to employ object-aware listwise preference optimization for VLMs, addressing hallucinations without requiring extra annotations beyond standard preference data.

Findings

01

LPOI outperforms existing methods in reducing hallucinations.

02

LPOI enhances VLM performance on benchmark datasets.

03

The method requires no additional annotations beyond pairwise preferences.

Abstract

Aligning large VLMs with human preferences is a challenging task, as methods like RLHF and DPO often overfit to textual information or exacerbate hallucinations. Although augmenting negative image samples partially addresses these pitfalls, no prior work has employed listwise preference optimization for VLMs, due to the complexity and cost of constructing listwise image samples. In this work, we propose LPOI, the first object-aware listwise preference optimization developed for reducing hallucinations in VLMs. LPOI identifies and masks a critical object in the image, and then interpolates the masked region between the positive and negative images to form a sequence of incrementally more complete images. The model is trained to rank these images in ascending order of object visibility, effectively reducing hallucinations while retaining visual fidelity. LPOI requires no extra annotations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fatemehpesaran310/lpoi
pytorchOfficial

Videos

LPOI: Listwise Preference Optimization for Vision Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Semantic Web and Ontologies

MethodsDirect Preference Optimization