Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models

Nanxing Hu; Xiaoyue Duan; Jinchao Zhang; Guoliang Kang

arXiv:2505.19498·cs.CV·August 20, 2025

Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models

Nanxing Hu, Xiaoyue Duan, Jinchao Zhang, Guoliang Kang

PDF

Open Access

TL;DR

This paper introduces a Bayesian approach to reduce hallucinations in large vision-language models by evaluating visual token informativeness, rectifying prior information, and stopping generation early, leading to improved alignment with visual input.

Contribution

The paper systematically analyzes factors causing hallucination in LVLMs and proposes a Bayesian framework with three strategies to enhance visual reliance in text generation.

Findings

01

Significant reduction in hallucinations across three benchmarks.

02

Improved alignment between generated text and visual input.

03

Outperforms previous state-of-the-art methods in mitigating hallucination.

Abstract

Large Vision-Language Models (LVLMs) usually generate texts which satisfy context coherence but don't match the visual input. Such a hallucination issue hinders LVLMs' applicability in the real world. The key to solving hallucination in LVLM is to make the text generation rely more on the visual content. Most previous works choose to enhance/adjust the features/output of a specific modality (i.e., visual or textual) to alleviate hallucinations in LVLM, which do not explicitly or systematically enhance the visual reliance. In this paper, we comprehensively investigate the factors which may degenerate the visual reliance in text generation of LVLM from a Bayesian perspective. Based on our observations, we propose to mitigate hallucination in LVLM from three aspects. Firstly, we observe that not all visual tokens are informative in generating meaningful texts. We propose to evaluate and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Digital Mental Health Interventions · Psychedelics and Drug Studies