Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning

Shengqiong Wu; Hao Fei; Liangming Pan; William Yang Wang; Shuicheng; Yan; Tat-Seng Chua

arXiv:2412.11124·cs.CV·December 24, 2024

Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning

Shengqiong Wu, Hao Fei, Liangming Pan, William Yang Wang, Shuicheng, Yan, Tat-Seng Chua

PDF

Open Access 1 Video

TL;DR

This paper introduces a bottom-up reasoning framework for multimodal large language models to reduce hallucinations by verifying visual and textual inputs with commonsense knowledge, leading to more reliable outputs.

Contribution

It proposes a novel holistic reasoning approach that combines perception and cognition-level verification to effectively combat hallucinations in MLLMs.

Findings

01

Significant improvements on hallucination benchmarks

02

Enhanced reliability of multimodal outputs

03

Effective handling of perception- and cognition-level hallucinations

Abstract

Recent advancements in multimodal large language models (MLLMs) have shown unprecedented capabilities in advancing various vision-language tasks. However, MLLMs face significant challenges with hallucinations, and misleading outputs that do not align with the input data. While existing efforts are paid to combat MLLM hallucinations, several pivotal challenges are still unsolved. First, while current approaches aggressively focus on addressing errors at the perception level, another important type at the cognition level requiring factual commonsense can be overlooked. In addition, existing methods might fall short in finding a more effective way to represent visual input, which is yet a key bottleneck that triggers visual hallucinations. Moreover, MLLMs can frequently be misled by faulty textual inputs and cause hallucinations, while unfortunately, this type of issue has long been…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning· underline

Taxonomy

TopicsHallucinations in medical conditions

MethodsFocus · ALIGN