Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs

Hao Fang; Changle Zhou; Jiawei Kong; Kuofeng Gao; Bin Chen; Shu-Tao Xia

arXiv:2505.19678·cs.CL·October 24, 2025

Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs

Hao Fang, Changle Zhou, Jiawei Kong, Kuofeng Gao, Bin Chen, Shu-Tao Xia

PDF

Open Access

TL;DR

This paper introduces a novel decoding strategy for LVLMs that reduces hallucinations by adaptively maximizing the mutual dependency between generated responses and input images, leading to more accurate and relevant outputs.

Contribution

It proposes a C-PMI calibrated decoding method that jointly models visual and textual contributions, formulated as a bi-level optimization problem for hallucination mitigation.

Findings

01

Significantly reduces hallucinations in LVLMs.

02

Maintains decoding efficiency while improving relevance.

03

Effective across various benchmark datasets.

Abstract

Large Vision-Language Models (LVLMs) are susceptible to hallucinations, where generated responses seem semantically plausible yet exhibit little or no relevance to the input image. Previous studies reveal that this issue primarily stems from LVLMs' over-reliance on language priors while disregarding the visual information during decoding. To alleviate this issue, we introduce a novel Conditional Pointwise Mutual Information (C-PMI) calibrated decoding strategy, which adaptively strengthens the mutual dependency between generated texts and input images to mitigate hallucinations. Unlike existing methods solely focusing on text token sampling, we propose to jointly model the contributions of visual and textual tokens to C-PMI, formulating hallucination mitigation as a bi-level optimization problem aimed at maximizing mutual information. To solve it, we design a token purification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychiatry, Mental Health, Neuroscience · EEG and Brain-Computer Interfaces · Topological and Geometric Data Analysis