Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation

Xingyu Zhu; Junfeng Fang; Shuo Wang; Beier Zhu; Zhicai Wang; Yonghui Yang; Xiangnan He

arXiv:2604.20366·cs.CV·April 23, 2026

Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation

Xingyu Zhu, Junfeng Fang, Shuo Wang, Beier Zhu, Zhicai Wang, Yonghui Yang, Xiangnan He

PDF

TL;DR

This paper introduces MPD, a dual-stage framework that effectively reduces hallucinations in large vision-language models without degrading their overall generative performance.

Contribution

The proposed MPD method employs semantic-aware disentanglement and interpretable parameter updates to mitigate hallucinations efficiently without performance loss.

Findings

01

Reduces hallucinations by 23.4% on benchmark datasets.

02

Maintains 97.4% of original generative capability.

03

Achieves state-of-the-art results with no extra computational cost.

Abstract

Large Vision-Language Models (LVLMs) exhibit powerful generative capabilities but frequently produce hallucinations that compromise output reliability. Fine-tuning on annotated data devoid of hallucinations offers the most direct solution, while its high computational cost motivates recent representation-based methods, which focus on mitigating hallucinatory components within hidden representations. Though efficient, we empirically observe that these methods degrade general generation capacity due to incomplete extraction of hallucination components and non-selective parameter updates. To address these limitations, we propose MPD, a dual-stage framework for mitigating hallucinations without performance degradation. Specifically, our MPD relies on two essential factors: (1) semantic-aware component disentanglement to extract pure hallucination components, and (2) interpretable parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.