HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs
Qing Li, Jiahui Geng, Zongxiong Chen, Derui Zhu, Yuxia Wang, Congbo Ma, Chenyang Lyu, Fakhri Karray

TL;DR
This paper introduces HD-NDEs, a neural differential equations-based method to detect hallucinations in large language models by modeling their dynamics in latent space, significantly improving detection accuracy.
Contribution
The paper presents a novel neural differential equations approach for hallucination detection in LLMs, capturing full model dynamics for more reliable truth assessment.
Findings
Over 14% improvement in AUC-ROC on True-False dataset
Effective across five datasets and six LLMs
Outperforms existing classification-based methods
Abstract
In recent years, large language models (LLMs) have made remarkable advancements, yet hallucination, where models produce inaccurate or non-factual statements, remains a significant challenge for real-world deployment. Although current classification-based methods, such as SAPLMA, are highly efficient in mitigating hallucinations, they struggle when non-factual information arises in the early or mid-sequence of outputs, reducing their reliability. To address these issues, we propose Hallucination Detection-Neural Differential Equations (HD-NDEs), a novel method that systematically assesses the truthfulness of statements by capturing the full dynamics of LLMs within their latent space. Our approaches apply neural differential equations (Neural DEs) to model the dynamic system in the latent space of LLMs. Then, the sequence in the latent space is mapped to the classification space for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComputational Drug Discovery Methods
