LISA: A Layer-wise Integration and Suppression Approach for Hallucination Mitigation in Multimodal Large Language Models

Zhihui Guo; Xin Man; Hui Xu; Jie Shao; Zhiguo Jiang; Xianchao Zhang; Heng Tao Shen

arXiv:2507.19110·cs.CV·November 14, 2025

LISA: A Layer-wise Integration and Suppression Approach for Hallucination Mitigation in Multimodal Large Language Models

Zhihui Guo, Xin Man, Hui Xu, Jie Shao, Zhiguo Jiang, Xianchao Zhang, Heng Tao Shen

PDF

Open Access

TL;DR

LISA is a novel layer-wise approach that mitigates object hallucinations in multimodal large language models by spectral modulation and adaptive token fusion, significantly improving factual accuracy across benchmarks.

Contribution

LISA introduces a plug-and-play method combining spectral modulation and token-level fusion to reduce hallucinations in MLLMs, enhancing their reliability without retraining.

Findings

01

Reduces hallucinations by up to 53.6% on CHAIR_I benchmark.

02

Improves POPE F1 score by up to 5.1%.

03

Demonstrates strong generalization across different models and tasks.

Abstract

Multimodal Large Language Models (MLLMs) excel in vision-language tasks such as image captioning but remain prone to object hallucinations, where they describe objects that do not appear in the image. To mitigate this, we propose LISA, a Layer-wise Integration and Suppression Approach. LISA leverages the layer-wise functional roles in MLLMs: shallow layers provide visual grounding, middle layers encode semantics, and deep layers tend to amplify spurious signals. First, layer-wise spectral modulation stabilizes attention by suppressing over-amplified activations in deeper layers while preserving alignment cues in earlier layers. Second, token-level logits from selected layers are fused via anchor-based routing, with token-wise anchor selection and soft logit fusion enabling adaptive integration during decoding. LISA is fully plug-and-play and can be seamlessly integrated into existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis