Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis and Interpretation

Renfei Dang; Peng Hu; Zhejian Lai; Changjiang Gao; Min Zhang; Shujian Huang

arXiv:2511.02626·cs.CL·April 20, 2026

Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis and Interpretation

Renfei Dang, Peng Hu, Zhejian Lai, Changjiang Gao, Min Zhang, Shujian Huang

PDF

TL;DR

This paper investigates how fine-tuning on new knowledge causes factual hallucinations in large language models, analyzing underlying mechanisms and proposing methods to mitigate such errors.

Contribution

It introduces a controlled dataset and provides a detailed analysis of hallucination mechanisms, highlighting the role of attention shifts and knowledge familiarity.

Findings

01

Hallucinations affect both new knowledge tasks and other evaluation tasks.

02

Familiarity of knowledge type influences hallucination severity more than the proportion of new knowledge.

03

Reintroducing known knowledge during training reduces hallucinations by restoring attention to key entities.

Abstract

Prior works have shown that fine-tuning on new knowledge can induce factual hallucinations in large language models (LLMs), leading to incorrect outputs when evaluated on previously known information. However, the specific manifestations of such hallucination and its underlying mechanisms remain insufficiently understood. Our work addresses this gap by designing a controlled dataset \textit{Biography-Reasoning}, and conducting a fine-grained analysis across multiple knowledge types and two task types, including knowledge question answering (QA) and knowledge reasoning tasks. We find that hallucinations not only severely affect tasks involving newly introduced knowledge, but also propagate to other evaluation tasks. Moreover, when fine-tuning on a dataset in which a specific knowledge type consists entirely of new knowledge, LLMs exhibit elevated hallucination tendencies. This suggests…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.