Lyapunov Probes for Hallucination Detection in Large Foundation Models

Bozhi Luan; Gen Li; Yalan Qin; Jifeng Guo; Yun Zhou; Faguo Wu; Hongwei Zheng; Wenjun Wu; Zhaoxin Fan

arXiv:2603.06081·cs.CV·March 9, 2026

Lyapunov Probes for Hallucination Detection in Large Foundation Models

Bozhi Luan, Gen Li, Yalan Qin, Jifeng Guo, Yun Zhou, Faguo Wu, Hongwei Zheng, Wenjun Wu, Zhaoxin Fan

PDF

Open Access

TL;DR

This paper introduces Lyapunov Probes, a novel method for detecting hallucinations in large language models by analyzing the stability of their representations through dynamical systems theory, leading to more reliable hallucination detection.

Contribution

The paper proposes Lyapunov Probes, a new stability-based approach for hallucination detection in LLMs and MLLMs, leveraging derivative constraints and systematic perturbation analysis.

Findings

01

Lyapunov Probes outperform existing baselines in hallucination detection.

02

The method effectively identifies unstable, hallucination-prone regions.

03

Experiments show consistent improvements across diverse datasets and models.

Abstract

We address hallucination detection in Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) by framing the problem through the lens of dynamical systems stability theory. Rather than treating hallucination as a straightforward classification task, we conceptualize (M)LLMs as dynamical systems, where factual knowledge is represented by stable equilibrium points within the representation space. Our main insight is that hallucinations tend to arise at the boundaries of knowledge-transition regions separating stable and unstable zones. To capture this phenomenon, we propose Lyapunov Probes: lightweight networks trained with derivative-based stability constraints that enforce a monotonic decay in confidence under input perturbations. By performing systematic perturbation analysis and applying a two-stage training process, these probes reliably distinguish between stable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Graph Neural Networks · Mental Health via Writing