LISA: A Layer-wise Integration and Suppression Approach for Hallucination Mitigation in Multimodal Large Language Models
Zhihui Guo, Xin Man, Hui Xu, Jie Shao, Zhiguo Jiang, Xianchao Zhang, Heng Tao Shen

TL;DR
LISA is a novel layer-wise approach that mitigates object hallucinations in multimodal large language models by spectral modulation and adaptive token fusion, significantly improving factual accuracy across benchmarks.
Contribution
LISA introduces a plug-and-play method combining spectral modulation and token-level fusion to reduce hallucinations in MLLMs, enhancing their reliability without retraining.
Findings
Reduces hallucinations by up to 53.6% on CHAIR_I benchmark.
Improves POPE F1 score by up to 5.1%.
Demonstrates strong generalization across different models and tasks.
Abstract
Multimodal Large Language Models (MLLMs) excel in vision-language tasks such as image captioning but remain prone to object hallucinations, where they describe objects that do not appear in the image. To mitigate this, we propose LISA, a Layer-wise Integration and Suppression Approach. LISA leverages the layer-wise functional roles in MLLMs: shallow layers provide visual grounding, middle layers encode semantics, and deep layers tend to amplify spurious signals. First, layer-wise spectral modulation stabilizes attention by suppressing over-amplified activations in deeper layers while preserving alignment cues in earlier layers. Second, token-level logits from selected layers are fused via anchor-based routing, with token-wise anchor selection and soft logit fusion enabling adaptive integration during decoding. LISA is fully plug-and-play and can be seamlessly integrated into existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
