Detecting Token-Level Hallucinations Using Variance Signals: A Reference-Free Approach
Keshav Kumar

TL;DR
This paper presents a novel, reference-free method for detecting token-level hallucinations in large language models by analyzing variance in token probabilities across multiple generations, enabling real-time and post-hoc reliability assessment.
Contribution
It introduces a model-agnostic, interpretable framework that leverages variance signals for hallucination detection without requiring ground-truth references.
Findings
Variance signals correlate with hallucination patterns.
Method performs well across different models and scales.
Framework is lightweight and adaptable to various domains.
Abstract
Large Language Models (LLMs) have demonstrated impressive generative capabilities across diverse tasks but remain susceptible to hallucinations, confidently generated yet factually incorrect outputs. We introduce a reference-free, token-level hallucination detection framework that leverages the variance in token log-probabilities across multiple stochastic generations. Unlike prior methods that require ground-truth references or sentence-level verification, our approach is model-agnostic, interpretable, and suited for real-time or post-hoc analysis. We evaluate our method on unanswerable question prompts from the SQuAD v2 dataset and benchmark across three autoregressive models of varying scales: GPT-Neo 125M, Falcon 1B, and Mistral 7B. Through both quantitative metrics and visual diagnostics, we show that token-level variance reliably highlights instability in model outputs and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
