HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
Xinyue Zeng, Junhong Lin, Yujun Yan, Feng Guo, Liang Shi, Jun Wu, Dawei Zhou

TL;DR
HalluGuard introduces a unified theoretical framework and an NTK-based detection method to identify both data-driven and reasoning-driven hallucinations in LLMs, improving reliability in high-stakes applications.
Contribution
The paper presents the Hallucination Risk Bound framework and HalluGuard, a novel NTK-based score for joint detection of different hallucination types in LLMs, with extensive evaluation.
Findings
HalluGuard achieves state-of-the-art detection performance.
The framework effectively decomposes hallucination sources.
HalluGuard generalizes across diverse benchmarks and models.
Abstract
The reliability of Large Language Models (LLMs) in high-stakes domains such as healthcare, law, and scientific discovery is often compromised by hallucinations. These failures typically stem from two sources: data-driven hallucinations and reasoning-driven hallucinations. However, existing detection methods usually address only one source and rely on task-specific heuristics, limiting their generalization to complex scenarios. To overcome these limitations, we introduce the Hallucination Risk Bound, a unified theoretical framework that formally decomposes hallucination risk into data-driven and reasoning-driven components, linked respectively to training-time mismatches and inference-time instabilities. This provides a principled foundation for analyzing how hallucinations emerge and evolve. Building on this foundation, we introduce HalluGuard, an NTK-based score that leverages the…
Peer Reviews
Decision·ICLR 2026 Poster
1) The primary strength of the paper is to cast the hallucination detection problem in a formal theoretical framework, to decompose the hallucination risk into data-driven and reasoning-driven sources. This lays appropriate groundwork to then analyse the source of hallucinations themselves, which otherwise is often heuristic at best. 2) The mathematical framework is introduced in a clear and lucid manner, though some terms are introduced with limited motivation (condition number term). Furtherm
1) While the core theory as presented in Theorem 3.2 decomposes risk into two terms (data-driven and reasoning-driven), the final score is an additive combination of three terms. The inclusion of the third term ($-log~\kappa^2$), while explored in the appendix as a penalty, appears to be arbitrary and appended without adequate motivation overall. Furthermore, the paper relies on a Freedman inequality to show that the reasoning-driven error term grows exponentially with sequence length T. Howeve
1. The paper introduces a novel theoretical framework, the Hallucination Risk Bound (HRB), that unifies data-driven and reasoning-driven hallucinations under a single mathematical formulation—a first in this research area. 2. The work is methodologically rigorous, combining solid theoretical derivations with extensive empirical validation across 10 benchmarks, 11 baselines, and 9 LLM architectures. 3. The paper is exceptionally clear and well-organized, balancing technical precision with intuiti
1. While the theoretical formulation is elegant, the connection between NTK geometry and hallucination phenomena could be further deepened by clarifying how kernel dynamics specifically capture semantic drift and logical inconsistency beyond representational similarity. 2. Although the experiments are broad, the evaluation largely focuses on detection performance metrics (AUROC, AUPRC), leaving limited insight into causal behavior—whether reducing HALLUGUARD score indeed prevents hallucinations
- Admittedly I have not checked the math thoroughly. But I based on my understanding, HalluGuard is well motivated with sound mathematical justification. - I really appreciate the clarity with which the authors justify the question of efficacy of HalluGuard compared existing methods in the literature. The various dimensions in which they measure the performance of HalluGuard were well stated and more importantly, extensive experimental results were provided in favor of HalluGuard.
- Some justification for the assumptions in Section 3.2 would be nice to see. I did not see any references, discussions or proofs for the validity or the practical applicability of the assumptions. **Minor:** There is some text overlap in the headings of Table 4
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Machine Learning in Healthcare · Adversarial Robustness in Machine Learning
