Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach
Ernesto Quevedo, Jorge Yero, Rachel Koerner, Pablo Rivas, Tomas Cerny

TL;DR
This paper presents a resource-efficient supervised learning method for detecting hallucinations in LLM outputs using simple token probability features, outperforming existing approaches across multiple benchmarks.
Contribution
Introduces a novel, lightweight detection approach using only four numerical features from token probabilities, avoiding complex linguistic analyses and extensive LLM reliance.
Findings
Outperforms state-of-the-art hallucination detection methods
Effective across three different benchmark datasets
Highlights importance of feature selection and evaluator LLM choice
Abstract
Concerns regarding the propensity of Large Language Models (LLMs) to produce inaccurate outputs, also known as hallucinations, have escalated. Detecting them is vital for ensuring the reliability of applications relying on LLM-generated content. Current methods often demand substantial resources and rely on extensive LLMs or employ supervised learning with multidimensional features or intricate linguistic and semantic analyses difficult to reproduce and largely depend on using the same LLM that hallucinated. This paper introduces a supervised learning approach employing two simple classifiers utilizing only four numerical features derived from tokens and vocabulary probabilities obtained from other LLM evaluators, which are not necessarily the same. The method yields promising results, surpassing state-of-the-art outcomes in multiple tasks across three different benchmarks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Mental Health via Writing
