A Graph Signal Processing Framework for Hallucination Detection in Large Language Models
Valentin No\"el

TL;DR
This paper introduces a spectral graph analysis framework for detecting hallucinations in large language models, revealing distinct spectral patterns associated with different error types and achieving high detection accuracy.
Contribution
It proposes a novel graph signal processing approach to model transformer layers and identify hallucinations, providing theoretical insights and practical detection methods.
Findings
Factual statements show low-frequency spectral convergence.
Hallucination types have distinct spectral signatures.
Spectral analysis achieves 88.75% detection accuracy.
Abstract
Large language models achieve impressive results but distinguishing factual reasoning from hallucinations remains challenging. We propose a spectral analysis framework that models transformer layers as dynamic graphs induced by attention, with token embeddings as signals on these graphs. Through graph signal processing, we define diagnostics including Dirichlet energy, spectral entropy, and high-frequency energy ratios, with theoretical connections to computational stability. Experiments across GPT architectures suggest universal spectral patterns: factual statements exhibit consistent "energy mountain" behavior with low-frequency convergence, while different hallucination types show distinct signatures. Logical contradictions destabilize spectra with large effect sizes (), semantic errors remain stable but show connectivity drift, and substitution hallucinations display…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Adversarial Robustness in Machine Learning
