Mitigating Hallucination in Multimodal Large Language Model via   Hallucination-targeted Direct Preference Optimization

Yuhan Fu; Ruobing Xie; Xingwu Sun; Zhanhui Kang; Xirong Li

arXiv:2411.10436·cs.CL·November 18, 2024

Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization

Yuhan Fu, Ruobing Xie, Xingwu Sun, Zhanhui Kang, Xirong Li

PDF

Open Access

TL;DR

This paper introduces Hallucination-targeted Direct Preference Optimization (HDPO), a novel method to reduce hallucinations in Multimodal Large Language Models by addressing their diverse causes, leading to improved performance over existing approaches.

Contribution

The paper proposes HDPO, a new approach that specifically targets different causes of hallucinations in MLLMs, outperforming previous methods in hallucination mitigation.

Findings

01

HDPO surpasses most SOTA methods in hallucination reduction.

02

The method effectively addresses hallucinations from visual, contextual, and multimodal conflicts.

03

Ablation studies confirm the effectiveness and potential for further scaling.

Abstract

Multimodal Large Language Models (MLLMs) are known to hallucinate, which limits their practical applications. Recent works have attempted to apply Direct Preference Optimization (DPO) to enhance the performance of MLLMs, but have shown inconsistent improvements in mitigating hallucinations. To address this issue more effectively, we introduce Hallucination-targeted Direct Preference Optimization (HDPO) to reduce hallucinations in MLLMs. Unlike previous approaches, our method tackles hallucinations from their diverse forms and causes. Specifically, we develop three types of preference pair data targeting the following causes of MLLM hallucinations: (1) insufficient visual capabilities, (2) long context generation, and (3) multimodal conflicts. Experimental results demonstrate that our method achieves superior performance across multiple hallucination evaluation datasets, surpassing most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopological and Geometric Data Analysis · Advanced Graph Neural Networks · Machine Learning in Healthcare