Mitigating Multimodal Hallucination via Phase-wise Self-reward

Yu Zhang; Chuyang Sun; Kehai Chen; Xuefeng Bai; Yang Xiang; Min Zhang

arXiv:2604.17982·cs.CV·April 21, 2026

Mitigating Multimodal Hallucination via Phase-wise Self-reward

Yu Zhang, Chuyang Sun, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang

PDF

TL;DR

This paper introduces PSRD, a dynamic inference-time framework that mitigates vision hallucination in LVLMs by using phase-wise self-reward signals and a lightweight reward model, significantly reducing hallucinations.

Contribution

It proposes a novel phase-wise self-reward decoding method that dynamically suppresses hallucinations without external supervision, improving efficiency and effectiveness.

Findings

01

PSRD reduces hallucination rate of LLaVA-1.5-7B by 50%.

02

It outperforms existing post-hoc methods across five benchmarks.

03

The approach enables a controllable trade-off between performance and inference efficiency.

Abstract

Large Vision-Language Models (LVLMs) still struggle with vision hallucination, where generated responses are inconsistent with the visual input. Existing methods either rely on large-scale annotated data for fine-tuning, which incurs massive computational overhead, or employ static post-hoc strategies that overlook the dynamic nature of hallucination emergence. To address these, we introduce a new self-rewarding framework, enabling dynamic hallucination mitigation at inference time without external supervision. On the empirical side, we reveal that visual hallucination exhibits phase-wise dynamic patterns, peaking at the onset of each semantic phase. Drawing on these insights, we propose \textbf{PSRD} (\textbf{Phase-wise \textbf{S}elf-\textbf{R}eward \textbf{D}ecoding) for online hallucination correction guided by phase-wise self-reward signals. To reduce the cost of repeated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.