TL;DR
This paper introduces an Attention-Shifting framework for selective unlearning in large language models, aiming to reduce memorized sensitive data while maintaining response quality and minimizing hallucinations.
Contribution
The novel Attention-Shifting approach selectively suppresses and enhances attention to unlearned and retained tokens, improving unlearning effectiveness while preserving model utility.
Findings
Achieves up to 15% higher accuracy on ToFU benchmark
Attains 10% improvement on TDEC benchmark
Maintains competitive hallucination-free unlearning
Abstract
The increase in computing power and the necessity of AI-assisted decision-making boost the growing application of large language models (LLMs). Along with this, the potential retention of sensitive data of LLMs has spurred increasing research into machine unlearning. However, existing unlearning approaches face a critical dilemma: Aggressive unlearning compromises model utility, while conservative strategies preserve utility but risk hallucinated responses. This significantly limits LLMs' reliability in knowledge-intensive applications. To address this, we introduce a novel Attention-Shifting (AS) framework for selective unlearning. AS is driven by two design objectives: (1) context-preserving suppression that attenuates attention to fact-bearing tokens without disrupting LLMs' linguistic structure; and (2) hallucination-resistant response shaping that discourages fabricated completions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
