MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

Haonan Ge; Yiwei Wang; Ming-Hsuan Yang; Yujun Cai

arXiv:2508.10264·cs.CV·October 14, 2025

MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

Haonan Ge, Yiwei Wang, Ming-Hsuan Yang, Yujun Cai

PDF

TL;DR

This paper introduces MRFD, a decoding method that reduces hallucinations in LVLMs by modeling inter-region consistency without retraining, leading to more factual responses across various benchmarks.

Contribution

MRFD is a novel, training-free decoding approach that enhances factual grounding in LVLMs by leveraging inter-region consistency and reliability weighting.

Findings

01

Significantly reduces hallucinations in LVLM outputs.

02

Improves factual accuracy across multiple benchmarks.

03

Does not require additional model training or fine-tuning.

Abstract

Large Vision-Language Models (LVLMs) have shown strong performance across multimodal tasks. However, they often produce hallucinations -- text that is inconsistent with visual input, due to the limited ability to verify information in different regions of the image. To address this, we propose Multi-Region Fusion Decoding (MRFD), a training-free decoding method that improves factual grounding by modeling inter-region consistency. MRFD identifies salient regions using cross-attention, generates initial responses for each, and computes reliability weights based on Jensen-Shannon Divergence (JSD) among the responses. These weights guide a consistency-aware fusion of per-region predictions, using region-aware prompts inspired by Chain-of-Thought reasoning. Experiments across multiple LVLMs and benchmarks show that MRFD significantly reduces hallucinations and improves response factuality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.