Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting

Chenchen Tan; Youyang Qu; Xinghao Li; Hui Zhang; Shujie Cui; Cunjian Chen; Longxiang Gao

arXiv:2510.17210·cs.CL·April 20, 2026

Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting

Chenchen Tan, Youyang Qu, Xinghao Li, Hui Zhang, Shujie Cui, Cunjian Chen, Longxiang Gao

PDF

1 Video

TL;DR

This paper introduces an Attention-Shifting framework for selective unlearning in large language models, aiming to reduce memorized sensitive data while maintaining response quality and minimizing hallucinations.

Contribution

The novel Attention-Shifting approach selectively suppresses and enhances attention to unlearned and retained tokens, improving unlearning effectiveness while preserving model utility.

Findings

01

Achieves up to 15% higher accuracy on ToFU benchmark

02

Attains 10% improvement on TDEC benchmark

03

Maintains competitive hallucination-free unlearning

Abstract

The increase in computing power and the necessity of AI-assisted decision-making boost the growing application of large language models (LLMs). Along with this, the potential retention of sensitive data of LLMs has spurred increasing research into machine unlearning. However, existing unlearning approaches face a critical dilemma: Aggressive unlearning compromises model utility, while conservative strategies preserve utility but risk hallucinated responses. This significantly limits LLMs' reliability in knowledge-intensive applications. To address this, we introduce a novel Attention-Shifting (AS) framework for selective unlearning. AS is driven by two design objectives: (1) context-preserving suppression that attenuates attention to fact-bearing tokens without disrupting LLMs' linguistic structure; and (2) hallucination-resistant response shaping that discourages fabricated completions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting· slideslive