V-DPO: Mitigating Hallucination in Large Vision Language Models via   Vision-Guided Direct Preference Optimization

Yuxi Xie; Guanzhen Li; Xiao Xu; Min-Yen Kan

arXiv:2411.02712·cs.CV·November 6, 2024

V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization

Yuxi Xie, Guanzhen Li, Xiao Xu, Min-Yen Kan

PDF

Open Access 1 Repo

TL;DR

V-DPO is a novel training method that reduces hallucination in large vision-language models by emphasizing visual context learning through preference optimization, leading to improved alignment with visual inputs.

Contribution

The paper introduces V-DPO, a new preference learning approach that enhances visual context understanding in LVLMs, addressing hallucination issues caused by language priors.

Findings

01

V-DPO significantly reduces hallucination in LVLMs.

02

V-DPO outperforms baseline methods on hallucination benchmarks.

03

V-DPO effectively learns from image-contrast preference data.

Abstract

Large vision-language models (LVLMs) suffer from hallucination, resulting in misalignment between the output textual response and the input visual content. Recent research indicates that the over-reliance on the Large Language Model (LLM) backbone, as one cause of the LVLM hallucination, inherently introduces bias from language priors, leading to insufficient context attention to the visual inputs. We tackle this issue of hallucination by mitigating such over-reliance through preference learning. We propose Vision-guided Direct Preference Optimization (V-DPO) to enhance visual context learning at training time. To interpret the effectiveness and generalizability of V-DPO on different types of training data, we construct a synthetic dataset containing both response- and image-contrast preference pairs, compared against existing human-annotated hallucination samples. Our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuxixie/v-dpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Big Data and Digital Economy

MethodsSoftmax · Attention Is All You Need