Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization
Jiulong Wu, Zhengliang Shi, Shuaiqiang Wang, Jizhou Huang, Dawei Yin, Lingyong Yan, Min Cao, Min Zhang

TL;DR
This paper introduces EMPO, a method to reduce hallucinations in large vision-language models by improving modality alignment and utilizing open-source data, significantly decreasing hallucination rates.
Contribution
The paper proposes Entity-centric Multimodal Preference Optimization (EMPO), a novel approach that enhances modality alignment and leverages open-source datasets to mitigate hallucinations in LVLMs.
Findings
EMPO reduces hallucination rates by 85.9% on Object-HalBench.
EMPO decreases hallucinations by 49.8% on MM-HalBench.
Enhanced modality alignment improves trustworthiness of LVLMs.
Abstract
Large Visual Language Models (LVLMs) have demonstrated impressive capabilities across multiple tasks. However, their trustworthiness is often challenged by hallucinations, which can be attributed to the modality misalignment and the inherent hallucinations of their underlying Large Language Models (LLMs) backbone. Existing preference alignment methods focus on aligning model responses with human preferences while neglecting image-text modality alignment, resulting in over-reliance on LLMs and hallucinations. In this paper, we propose Entity-centric Multimodal Preference Optimization (EMPO), which achieves enhanced modality alignment compared to existing human preference alignment methods. Besides, to overcome the scarcity of high-quality multimodal preference data, we utilize open-source instruction datasets to automatically construct high-quality preference data across three aspects:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Topic Modeling
