From Flat Facts to Sharp Hallucinations: Detecting Stubborn Errors via Gradient Sensitivity
Yee Zhing Liew, Andrew Huey Ping Tan, Anwar P.P. Abdul Majeed

TL;DR
This paper introduces EPGS, a geometric method that detects stubborn hallucinations in LLMs by measuring gradient sensitivity to identify sharp minima associated with confident errors.
Contribution
The paper proposes EPGS, a novel embedding-perturbed gradient sensitivity technique that effectively detects stubborn hallucinations by analyzing sharp minima in model embeddings.
Findings
EPGS outperforms entropy-based baselines in detecting high-confidence errors.
Gradient spike measurement correlates with sharp minima in model embeddings.
EPGS provides a robust signal for identifying factual errors in LLMs.
Abstract
Traditional hallucination detection fails on "Stubborn Hallucinations" - errors where LLMs are confidently wrong. We propose a geometric solution: Embedding-Perturbed Gradient Sensitivity (EPGS). We hypothesize that while robust facts reside in flat minima, stubborn hallucinations sit in sharp minima, supported by brittle memorization. EPGS detects this sharpness by perturbing input embeddings with Gaussian noise and measuring the resulting spike in gradient magnitude. This acts as an efficient proxy for the Hessian spectrum, differentiating stable knowledge from unstable memorization. Our experiments show that EPGS significantly outperforms entropy-based and representation-based baselines, providing a robust signal for detecting high-confidence factual errors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
