Probabilistic distances-based hallucination detection in LLMs with RAG
Rodion Oblovatny, Alexandra Kuleshova, Konstantin Polev, Alexey Zaytsev

TL;DR
This paper presents a novel, unsupervised method for detecting hallucinations in large language models within RAG systems by analyzing the geometric distances between token embedding distributions, achieving state-of-the-art results.
Contribution
The authors introduce a distance-based hallucination detection technique tailored for RAG systems, leveraging geometric analysis of token embeddings for improved factuality assessment.
Findings
Achieves state-of-the-art or competitive performance in hallucination detection.
Demonstrates transferability from NLI tasks to hallucination detection.
Provides an unsupervised, efficient detection method.
Abstract
Detecting hallucinations in large language models (LLMs) is critical for their safety in many applications. Without proper detection, these systems often provide harmful, unreliable answers. In recent years, LLMs have been actively used in retrieval-augmented generation (RAG) settings. However, hallucinations remain even in this setting, and while numerous hallucination detection methods have been proposed, most approaches are not specifically designed for RAG systems. To overcome this limitation, we introduce a hallucination detection method based on estimating the distances between the distributions of prompt token embeddings and language model response token embeddings. The method examines the geometric structure of token hidden states to reliably extract a signal of factuality in text, while remaining friendly to long sequences. Extensive experiments demonstrate that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mental Health via Writing · Natural Language Processing Techniques
