MTRE: Multi-Token Reliability Estimation for Hallucination Detection in VLMs

Geigh Zollicoffer; Minh Vu; Manish Bhattarai

arXiv:2505.11741·cs.AI·October 22, 2025

MTRE: Multi-Token Reliability Estimation for Hallucination Detection in VLMs

Geigh Zollicoffer, Minh Vu, Manish Bhattarai

PDF

Open Access 3 Reviews

TL;DR

This paper introduces MTRE, a novel method analyzing multiple tokens' logits in vision-language models to improve hallucination detection, significantly outperforming existing single-token approaches across various benchmarks.

Contribution

MTRE is a lightweight, white-box approach that leverages early token logits and self-attention to enhance hallucination detection in VLMs, addressing limitations of prior single-token methods.

Findings

01

MTRE achieves a 9.4% accuracy improvement over standard methods.

02

MTRE attains a 14.8% AUROC gain, establishing new state-of-the-art performance.

03

Effective across multiple diverse benchmarks and datasets.

Abstract

Vision-language models (VLMs) now rival human performance on many multimodal tasks, yet they still hallucinate objects or generate unsafe text. Current hallucination detectors, e.g., single-token linear probing (LP) and PTrue, typically analyze only the logit of the first generated token or just its highest-scoring component, overlooking richer signals embedded within earlier token distributions. We demonstrate that analyzing the complete sequence of early logits potentially provides substantially more diagnostic information. We emphasize that hallucinations may only emerge after several tokens, as subtle inconsistencies accumulate over time. By analyzing the Kullback-Leibler (KL) divergence between logits corresponding to hallucinated and non-hallucinated tokens, we underscore the importance of incorporating later-token logits to more accurately capture the reliability dynamics of…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 3

Strengths

1. The paper identifies a key limitation of previous approaches that rely solely on the first-token logit, showing through KL divergence analysis that hallucination-related divergence often arises in later tokens. This motivates the multi-token design in a theoretically grounded way. 2. MTRE introduces a lightweight yet principled multi-token aggregation method, formulated as a calibrated sequential log-likelihood ratio test. The design effectively balances interpretability and computational ef

Weaknesses

1. The current comparison is restricted to models such as LLaVA-v1.5 (7B), mPLUG-Owl, LLaMA-Adapter V2, and MiniGPT-4, which may now be considered relatively early-generation VLMs. It remains unclear how MTRE performs on more recent and stronger models (such as LLaVA-Next, InternVL2 or Qwen2.5-VL), which exhibit lower hallucination rates. Incorporating these models would better demonstrate the generality and contributions of MTRE. 2. All experiments use 7B-scale models. It would be informative

Reviewer 02Rating 4Confidence 5

Strengths

Clear Motivation and Rationale: The fundamental premise—that the full sequence of early logits contains more diagnostic information than the single first token—is clearly articulated and well-supported by preliminary analysis (Section 3). Sound Empirical Insights: The paper provides several valuable empirical observations (Section 3 and 5) that are useful for the VLM reliability community. For instance, the finding that hallucination divergence may emerge late in the sequence, and consequently

Weaknesses

Major Concerns 1. Limited Efficacy in Type I Setting: While MTRE significantly outperforms baselines in the Type II setting, its performance edge over the competitive Linear Probing (Lin. Prb.) baseline on Type I tasks (more common ones) is often marginal. 2. Incremental Gain of MTRE-$\tau$: MTRE-$\tau$ fails to demonstrate a clear and significant performance improvement over MTRE. Given that MTRE-$\tau$ introduces substantial additional complexity (cross-folding, parameter calibration, opti

Reviewer 03Rating 4Confidence 3

Strengths

1. The topic is interesting and tries to address an important problem. 2. The paper is well written

Weaknesses

1. The baseline model should add some new models, like LLaVA 1.5, LLaVA NeXT, Qwen 2.5 VL. 2. Figure 1 seems unclear, I recommend the authors add more explanations. 3. For the benchmark discussion, note that several recent studies [1, 2, 3] address both hallucination and maintain performance (even some improvement) on general scenario. I recommend the authors add some benchmarks like OCRBench, MMMU, MME etc. [1] Mitigating Object Hallucinations via Sentence-Level Early Intervention. [2] A to

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Multimodal Machine Learning Applications