Mitigating Multimodal Hallucination via Phase-wise Self-reward
Yu Zhang, Chuyang Sun, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang

TL;DR
This paper introduces PSRD, a dynamic inference-time framework that mitigates vision hallucination in LVLMs by using phase-wise self-reward signals and a lightweight reward model, significantly reducing hallucinations.
Contribution
It proposes a novel phase-wise self-reward decoding method that dynamically suppresses hallucinations without external supervision, improving efficiency and effectiveness.
Findings
PSRD reduces hallucination rate of LLaVA-1.5-7B by 50%.
It outperforms existing post-hoc methods across five benchmarks.
The approach enables a controllable trade-off between performance and inference efficiency.
Abstract
Large Vision-Language Models (LVLMs) still struggle with vision hallucination, where generated responses are inconsistent with the visual input. Existing methods either rely on large-scale annotated data for fine-tuning, which incurs massive computational overhead, or employ static post-hoc strategies that overlook the dynamic nature of hallucination emergence. To address these, we introduce a new self-rewarding framework, enabling dynamic hallucination mitigation at inference time without external supervision. On the empirical side, we reveal that visual hallucination exhibits phase-wise dynamic patterns, peaking at the onset of each semantic phase. Drawing on these insights, we propose \textbf{PSRD} (\textbf{Phase-wise \textbf{S}elf-\textbf{R}eward \textbf{D}ecoding) for online hallucination correction guided by phase-wise self-reward signals. To reduce the cost of repeated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
