TL;DR
This paper introduces SIVR, a new uncertainty estimation method for LLMs that leverages token-wise, layer-wise internal variance to detect hallucinations more effectively and generally.
Contribution
SIVR is a novel, model-agnostic framework that uses internal dispersion across layers to improve hallucination detection without relying on large training data.
Findings
SIVR outperforms existing baselines in hallucination detection.
It generalizes well across models and tasks.
It avoids reliance on large training datasets.
Abstract
Uncertainty estimation is a promising approach to detect hallucinations in large language models (LLMs). Recent approaches commonly depend on model internal states to estimate uncertainty. However, they suffer from strict assumptions on how hidden states should evolve across layers, and from information loss by solely focusing on last or mean tokens. To address these issues, we present Sequential Internal Variance Representation (SIVR), a supervised hallucination detection framework that leverages token-wise, layer-wise features derived from hidden states. SIVR adopts a more basic assumption that uncertainty manifests in the degree of dispersion or variance of internal representations across layers, rather than relying on specific assumptions, which makes the method model and task agnostic. It additionally aggregates the full sequence of per-token variance features, learning temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
