Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models

Kassoum Sanogo; Renzo Ardiccioni

arXiv:2512.07564·cs.CV·December 11, 2025

Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models

Kassoum Sanogo, Renzo Ardiccioni

PDF

Open Access

TL;DR

This paper introduces a training-free self-correction framework for vision-language models that reduces hallucinations by iteratively refining responses using uncertainty-guided visual re-attention, improving reliability without retraining.

Contribution

The proposed method enables VLMs to self-correct hallucinations through uncertainty-guided re-attention without gradient updates, validated on multiple benchmarks.

Findings

01

Hallucination rates reduced by 9.8 percentage points

02

Object existence accuracy improved by 4.7 points

03

Effective grounding of corrections in visual evidence

Abstract

Vision-language models (VLMs) frequently generate hallucinated content plausible but incorrect claims about image content. We propose a training-free self-correction framework enabling VLMs to iteratively refine responses through uncertainty-guided visual re-attention. Our method combines multidimensional uncertainty quantification (token entropy, attention dispersion, semantic consistency, claim confidence) with attention-guided cropping of under-explored regions. Operating entirely with frozen, pretrained VLMs, our framework requires no gradient updates. We validate our approach on the POPE and MMHAL BENCH benchmarks using the Qwen2.5-VL-7B [23] architecture. Experimental results demonstrate that our method reduces hallucination rates by 9.8 percentage points compared to the baseline, while improving object existence accuracy by 4.7 points on adversarial splits. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications