Attention Smoothing Is All You Need For Unlearning

Saleh Zare Zade; Xiangyu Zhou; Sijia Liu; Dongxiao Zhu

arXiv:2603.01285·cs.LG·March 3, 2026

Attention Smoothing Is All You Need For Unlearning

Saleh Zare Zade, Xiangyu Zhou, Sijia Liu, Dongxiao Zhu

PDF

Open Access 3 Reviews

TL;DR

The paper introduces Attention Smoothing Unlearning (ASU), a novel method that effectively erases memorized knowledge in large language models by smoothing attention distributions, improving unlearning stability and utility retention.

Contribution

ASU is a new unlearning framework that uses self-distillation and attention smoothing to better erase memorized data while preserving model coherence.

Findings

01

ASU outperforms baselines in various unlearning scenarios.

02

ASU maintains model utility with minimal loss.

03

ASU effectively erases factual memorization.

Abstract

Large Language Models are prone to memorizing sensitive, copyrighted, or hazardous content, posing significant privacy and legal concerns. Retraining from scratch is computationally infeasible, whereas current unlearning methods exhibit unstable trade-offs between forgetting and utility, frequently producing incoherent outputs on forget prompts and failing to generalize due to the persistence of lexical-level and semantic-level associations in attention. We propose Attention Smoothing Unlearning (ASU), a principled framework that casts unlearning as self-distillation from a forget-teacher derived from the model's own attention. By increasing the softmax temperature, ASU flattens attention distributions and directly suppresses the lexical-level and semantic-level associations responsible for reconstructing memorized knowledge. This results in a bounded optimization objective that erases…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

- The paper provides a conceptually simple yet effective formulation for unlearning based on attention smoothing. - The study includes comprehensive evaluations across QA and free-form completion settings, as well as scenario-based real-world tests. - The experiments demonstrate that ASU consistently outperforms existing unlearning baselines.

Weaknesses

- The work is heavily oriented toward experimental performance, and it lacks deeper analytical insight. The paper would benefit from additional analysis explaining why attention smoothing leads to differential effects on factual versus functional tokens. - The paper does not provide formal guidance for selecting the optimal attention temperature parameter, leaving it heuristic.

Reviewer 02Rating 4Confidence 4

Strengths

1. The classification of unlearning methods into two categories (e.g., divergence and convergence) provides an interesting conceptual framework that effectively supports the central idea of the paper. 2. The paper conducts extensive experiments on multiple benchmarks, including TOFU, MUSE, and WMDP, demonstrating the robustness of the proposed approach. 3. The paper is well organized and clearly written, making the overall argument easy to follow and the experimental results easy to interpret.

Weaknesses

1. Attention Smoothing Unlearning (ASU) can be interpreted through the lens of both divergence and convergence. Specifically, emphasizing knowledge distillation aligns with convergence toward the teacher model, whereas smoothing attention may introduce divergence by disrupting previously salient attention patterns. Therefore, it remains ambiguous whether ASU should be categorized under either paradigm or considered as an independent mechanism beyond both. 2. The rationale behind how ASU mitigat

Reviewer 03Rating 4Confidence 3

Strengths

- The proposed ASU framework requires minimal architectural changes, making it highly practical for large-scale unlearning. - The methodology is clearly described and the paper is easy to follow.

Weaknesses

- As mentioned in this paper, the teacher model is constructed by applying attention smoothing, i.e., increasing the softmax temperature in the self-attention mechanism. Will this operation hurts the performance of the teacher model. - Since this article requires a student model and a teacher model, the computational cost of forgetting should also be used as an indicator to evaluate the effect of each method. - Can you provide a detailed proof that applying attention smoothing can make the teach

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning