H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs

Cheng Gao; Huimin Chen; Chaojun Xiao; Zhiyi Chen; Zhiyuan Liu; Maosong Sun

arXiv:2512.01797·cs.AI·December 3, 2025

H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs

Cheng Gao, Huimin Chen, Chaojun Xiao, Zhiyi Chen, Zhiyuan Liu, Maosong Sun

PDF

Open Access 2 Models

TL;DR

This paper identifies a tiny subset of neurons in LLMs that are reliably associated with hallucinations, demonstrating their causal role and pre-training origins, which advances understanding of neural mechanisms behind hallucinations.

Contribution

The study systematically uncovers hallucination-associated neurons in LLMs, revealing their sparse nature, causal influence, and emergence during pre-training, providing new insights into neural mechanisms of hallucinations.

Findings

01

Less than 0.1% of neurons predict hallucinations reliably.

02

Hallucination-associated neurons causally influence over-compliance behaviors.

03

These neurons originate during the pre-training phase.

Abstract

Large language models (LLMs) frequently generate hallucinations -- plausible but factually incorrect outputs -- undermining their reliability. While prior work has examined hallucinations from macroscopic perspectives such as training data and objectives, the underlying neuron-level mechanisms remain largely unexplored. In this paper, we conduct a systematic investigation into hallucination-associated neurons (H-Neurons) in LLMs from three perspectives: identification, behavioral impact, and origins. Regarding their identification, we demonstrate that a remarkably sparse subset of neurons (less than $0.1%$ of total neurons) can reliably predict hallucination occurrences, with strong generalization across diverse scenarios. In terms of behavioral impact, controlled interventions reveal that these neurons are causally linked to over-compliance behaviors. Concerning their origins, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSchizophrenia research and treatment · Adversarial Robustness in Machine Learning · Ferroelectric and Negative Capacitance Devices