ART: Attention Replacement Technique to Improve Factuality in LLMs
Ziqin Luo, Yihao Quan, Xiaofeng Zhang, Xiaosong Yuan, Chen Shen

TL;DR
This paper introduces ART, a training-free method that replaces uniform attention in shallow layers of LLMs with local attention to reduce hallucinations and improve factual accuracy.
Contribution
The paper presents a novel, training-free attention replacement technique that effectively reduces hallucinations in LLMs by modifying shallow layer attention patterns.
Findings
ART significantly reduces hallucinations across multiple LLM architectures.
Replacing uniform attention with local attention improves focus on relevant information.
The method does not require fine-tuning or additional training data.
Abstract
Hallucination in large language models (LLMs) continues to be a significant issue, particularly in tasks like question answering, where models often generate plausible yet incorrect or irrelevant information. Although various methods have been proposed to mitigate hallucinations, the relationship between attention patterns and hallucinations has not been fully explored. In this paper, we analyze the distribution of attention scores across each layer and attention head of LLMs, revealing a common and intriguing phenomenon: shallow layers of LLMs primarily rely on uniform attention patterns, where the model distributes its attention evenly across the entire sequence. This uniform attention pattern can lead to hallucinations, as the model fails to focus on the most relevant information. To mitigate this issue, we propose a training-free method called Attention Replacement Technique (ART),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
