HALP: Detecting Hallucinations in Vision-Language Models without Generating a Single Token

Sai Akhil Kogilathota; Sripadha Vallabha E G; Luzhe Sun; Jiawei Zhou

arXiv:2603.05465·cs.CV·March 6, 2026

HALP: Detecting Hallucinations in Vision-Language Models without Generating a Single Token

Sai Akhil Kogilathota, Sripadha Vallabha E G, Luzhe Sun, Jiawei Zhou

PDF

Open Access

TL;DR

This paper presents a method to predict hallucination risks in vision-language models before text generation by probing internal representations, enabling safer and more efficient model deployment.

Contribution

It introduces a novel approach to detect hallucinations pre-generation using internal model probes across diverse VLMs, outperforming existing post-generation detection methods.

Findings

01

Probes achieve up to 0.93 AUROC in hallucination detection without decoding.

02

Pre-generation detection is effective across multiple architectures and modalities.

03

Different layers and features are most informative depending on the model architecture.

Abstract

Hallucinations remain a persistent challenge for vision-language models (VLMs), which often describe nonexistent objects or fabricate facts. Existing detection methods typically operate after text generation, making intervention both costly and untimely. We investigate whether hallucination risk can instead be predicted before any token is generated by probing a model's internal representations in a single forward pass. Across a diverse set of vision-language tasks and eight modern VLMs, including Llama-3.2-Vision, Gemma-3, Phi-4-VL, and Qwen2.5-VL, we examine three families of internal representations: (i) visual-only features without multimodal fusion, (ii) vision-token representations within the text decoder, and (iii) query-token representations that integrate visual and textual information before generation. Probes trained on these representations achieve strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Digital Media Forensic Detection