Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis and Interpretation
Renfei Dang, Peng Hu, Zhejian Lai, Changjiang Gao, Min Zhang, Shujian Huang

TL;DR
This paper investigates how fine-tuning on new knowledge causes factual hallucinations in large language models, analyzing underlying mechanisms and proposing methods to mitigate such errors.
Contribution
It introduces a controlled dataset and provides a detailed analysis of hallucination mechanisms, highlighting the role of attention shifts and knowledge familiarity.
Findings
Hallucinations affect both new knowledge tasks and other evaluation tasks.
Familiarity of knowledge type influences hallucination severity more than the proportion of new knowledge.
Reintroducing known knowledge during training reduces hallucinations by restoring attention to key entities.
Abstract
Prior works have shown that fine-tuning on new knowledge can induce factual hallucinations in large language models (LLMs), leading to incorrect outputs when evaluated on previously known information. However, the specific manifestations of such hallucination and its underlying mechanisms remain insufficiently understood. Our work addresses this gap by designing a controlled dataset \textit{Biography-Reasoning}, and conducting a fine-grained analysis across multiple knowledge types and two task types, including knowledge question answering (QA) and knowledge reasoning tasks. We find that hallucinations not only severely affect tasks involving newly introduced knowledge, but also propagate to other evaluation tasks. Moreover, when fine-tuning on a dataset in which a specific knowledge type consists entirely of new knowledge, LLMs exhibit elevated hallucination tendencies. This suggests…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
