V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization
Yuxi Xie, Guanzhen Li, Xiao Xu, Min-Yen Kan

TL;DR
V-DPO is a novel training method that reduces hallucination in large vision-language models by emphasizing visual context learning through preference optimization, leading to improved alignment with visual inputs.
Contribution
The paper introduces V-DPO, a new preference learning approach that enhances visual context understanding in LVLMs, addressing hallucination issues caused by language priors.
Findings
V-DPO significantly reduces hallucination in LVLMs.
V-DPO outperforms baseline methods on hallucination benchmarks.
V-DPO effectively learns from image-contrast preference data.
Abstract
Large vision-language models (LVLMs) suffer from hallucination, resulting in misalignment between the output textual response and the input visual content. Recent research indicates that the over-reliance on the Large Language Model (LLM) backbone, as one cause of the LVLM hallucination, inherently introduces bias from language priors, leading to insufficient context attention to the visual inputs. We tackle this issue of hallucination by mitigating such over-reliance through preference learning. We propose Vision-guided Direct Preference Optimization (V-DPO) to enhance visual context learning at training time. To interpret the effectiveness and generalizability of V-DPO on different types of training data, we construct a synthetic dataset containing both response- and image-contrast preference pairs, compared against existing human-annotated hallucination samples. Our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Big Data and Digital Economy
MethodsSoftmax · Attention Is All You Need
