Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images

Liangliang You; Junchi Yao; Shu Yang; Guimin Hu; Lijie Hu; Di Wang

arXiv:2506.07184·cs.AI·June 10, 2025

Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images

Liangliang You, Junchi Yao, Shu Yang, Guimin Hu, Lijie Hu, Di Wang

PDF

Open Access

TL;DR

This paper introduces SHE, a lightweight framework that detects and mitigates behavioral hallucinations in multimodal large language models handling sequential images, improving reliability without sacrificing accuracy.

Contribution

The work identifies behavioral hallucination causes and proposes SHE, a novel two-stage method with a new metric, BEACH, to reduce hallucinations in sequential image tasks.

Findings

01

SHE reduces behavioral hallucination by over 10% on BEACH.

02

SHE maintains descriptive accuracy in benchmarks.

03

The adaptive temporal window improves hallucination detection.

Abstract

While multimodal large language models excel at various tasks, they still suffer from hallucinations, which limit their reliability and scalability for broader domain applications. To address this issue, recent research mainly focuses on objective hallucination. However, for sequential images, besides objective hallucination, there is also behavioral hallucination, which is less studied. This work aims to fill in the gap. We first reveal that behavioral hallucinations mainly arise from two key factors: prior-driven bias and the snowball effect. Based on these observations, we introduce SHE (Sequence Hallucination Eradication), a lightweight, two-stage framework that (1) detects hallucinations via visual-textual alignment check using our proposed adaptive temporal window and (2) mitigates them via orthogonal projection onto the joint embedding space. We also propose a new metric (BEACH)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Misinformation and Its Impacts