Mitigating Hallucinations in Large Vision-Language Models by Adaptively   Constraining Information Flow

Jiaqi Bai; Hongcheng Guo; Zhongyuan Peng; Jian Yang; Zhoujun Li; Mohan; Li; Zhihong Tian

arXiv:2502.20750·cs.CL·March 3, 2025

Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow

Jiaqi Bai, Hongcheng Guo, Zhongyuan Peng, Jian Yang, Zhoujun Li, Mohan, Li, Zhihong Tian

PDF

1 Repo

TL;DR

This paper introduces AdaVIB, a method that reduces hallucinations in vision-language models by adaptively constraining irrelevant visual information using stochastic noise and information bottleneck techniques.

Contribution

The paper proposes AdaVIB, an adaptive noise-based method that mitigates object hallucinations in vision-language models by controlling overconfidence in irrelevant features.

Findings

01

Significant reduction in object hallucinations across benchmarks.

02

Improved alignment between visual features and language descriptions.

03

Enhanced robustness of vision-language models to irrelevant visual information.

Abstract

Large vision-language models show tremendous potential in understanding visual information through human languages. However, they are prone to suffer from object hallucination, i.e., the generated image descriptions contain objects that do not exist in the image. In this paper, we reveal that object hallucination can be attributed to overconfidence in irrelevant visual features when soft visual tokens map to the LLM's word embedding space. Specifically, by figuring out the semantic similarity between visual tokens and LLM's word embedding, we observe that the smoothness of similarity distribution strongly correlates with the emergence of object hallucinations. To mitigate hallucinations, we propose using the Variational Information Bottleneck (VIB) to alleviate overconfidence by introducing stochastic noise, facilitating the constraining of irrelevant information. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiaqi5598/adavib
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.