H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
Cheng Gao, Huimin Chen, Chaojun Xiao, Zhiyi Chen, Zhiyuan Liu, Maosong Sun

TL;DR
This paper identifies a tiny subset of neurons in LLMs that are reliably associated with hallucinations, demonstrating their causal role and pre-training origins, which advances understanding of neural mechanisms behind hallucinations.
Contribution
The study systematically uncovers hallucination-associated neurons in LLMs, revealing their sparse nature, causal influence, and emergence during pre-training, providing new insights into neural mechanisms of hallucinations.
Findings
Less than 0.1% of neurons predict hallucinations reliably.
Hallucination-associated neurons causally influence over-compliance behaviors.
These neurons originate during the pre-training phase.
Abstract
Large language models (LLMs) frequently generate hallucinations -- plausible but factually incorrect outputs -- undermining their reliability. While prior work has examined hallucinations from macroscopic perspectives such as training data and objectives, the underlying neuron-level mechanisms remain largely unexplored. In this paper, we conduct a systematic investigation into hallucination-associated neurons (H-Neurons) in LLMs from three perspectives: identification, behavioral impact, and origins. Regarding their identification, we demonstrate that a remarkably sparse subset of neurons (less than of total neurons) can reliably predict hallucination occurrences, with strong generalization across diverse scenarios. In terms of behavioral impact, controlled interventions reveal that these neurons are causally linked to over-compliance behaviors. Concerning their origins, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSchizophrenia research and treatment · Adversarial Robustness in Machine Learning · Ferroelectric and Negative Capacitance Devices
