Lyapunov Probes for Hallucination Detection in Large Foundation Models
Bozhi Luan, Gen Li, Yalan Qin, Jifeng Guo, Yun Zhou, Faguo Wu, Hongwei Zheng, Wenjun Wu, Zhaoxin Fan

TL;DR
This paper introduces Lyapunov Probes, a novel method for detecting hallucinations in large language models by analyzing the stability of their representations through dynamical systems theory, leading to more reliable hallucination detection.
Contribution
The paper proposes Lyapunov Probes, a new stability-based approach for hallucination detection in LLMs and MLLMs, leveraging derivative constraints and systematic perturbation analysis.
Findings
Lyapunov Probes outperform existing baselines in hallucination detection.
The method effectively identifies unstable, hallucination-prone regions.
Experiments show consistent improvements across diverse datasets and models.
Abstract
We address hallucination detection in Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) by framing the problem through the lens of dynamical systems stability theory. Rather than treating hallucination as a straightforward classification task, we conceptualize (M)LLMs as dynamical systems, where factual knowledge is represented by stable equilibrium points within the representation space. Our main insight is that hallucinations tend to arise at the boundaries of knowledge-transition regions separating stable and unstable zones. To capture this phenomenon, we propose Lyapunov Probes: lightweight networks trained with derivative-based stability constraints that enforce a monotonic decay in confidence under input perturbations. By performing systematic perturbation analysis and applying a two-stage training process, these probes reliably distinguish between stable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Graph Neural Networks · Mental Health via Writing
