Hallucination Detection: A Probabilistic Framework Using Embeddings Distance Analysis
Emanuele Ricco, Lorenzo Cima, Roberto Di Pietro

TL;DR
This paper introduces a probabilistic framework based on embedding distance analysis to detect hallucinations in large language models, revealing structural differences between hallucinated and correct content.
Contribution
It is the first to demonstrate that hallucinated content exhibits measurable structural differences in embedding space using Minkowski distances, enabling effective detection.
Findings
Significant differences in embedding distance distributions between hallucinated and correct content.
Detection tool achieves 66% accuracy, comparable to state-of-the-art methods.
Differences are scale-free, consistent across various norms and content parameters.
Abstract
Hallucinations are one of the major issues affecting LLMs, hindering their wide adoption in production systems. While current research solutions for detecting hallucinations are mainly based on heuristics, in this paper we introduce a mathematically sound methodology to reason about hallucination, and leverage it to build a tool to detect hallucinations. To the best of our knowledge, we are the first to show that hallucinated content has structural differences with respect to correct content. To prove this result, we resort to the Minkowski distances in the embedding space. Our findings demonstrate statistically significant differences in the embedding distance distributions, that are also scale free -- they qualitatively hold regardless of the distance norm used and the number of keywords, questions, or responses. We leverage these structural differences to develop a tool to detect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Mental Health Research Topics
