Look Within, Why LLMs Hallucinate: A Causal Perspective
He Li, Haoang Chi, Mingyu Liu, Wenjing Yang

TL;DR
This paper investigates the role of self-attention layers in LLM hallucinations from a causal perspective, showing that disabling certain layers can reduce hallucination issues and offering new insights for mitigation.
Contribution
It introduces a causal intervention method to study self-attention layers in LLMs, revealing their impact on hallucinations and suggesting targeted layer modifications for mitigation.
Findings
Disabling specific self-attention layers reduces hallucinations.
Front and tail layer interventions are most effective.
Provides a new causal understanding of hallucination mechanisms.
Abstract
The emergence of large language models (LLMs) is a milestone in generative artificial intelligence, achieving significant success in text comprehension and generation tasks. Despite the tremendous success of LLMs in many downstream tasks, they suffer from severe hallucination problems, posing significant challenges to the practical applications of LLMs. Most of the works about LLMs' hallucinations focus on data quality. Self-attention is a core module in transformer-based LLMs, while its potential relationship with LLMs' hallucination has been hardly investigated. To fill this gap, we study this problem from a causal perspective. We propose a method to intervene in LLMs' self-attention layers and maintain their structures and sizes intact. Specifically, we disable different self-attention layers in several popular open-source LLMs and then compare their degrees of hallucination with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEuropean and International Contract Law · Corporate Governance and Law
MethodsFocus
