SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion

Zizhao Hu; Ameya Godbole; Johnny Tian-Zheng Wei; Mohammad Rostami; Jesse Thomason; Robin Jia

arXiv:2605.07482·cs.LG·May 11, 2026

SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion

Zizhao Hu, Ameya Godbole, Johnny Tian-Zheng Wei, Mohammad Rostami, Jesse Thomason, Robin Jia

PDF

TL;DR

SHRED is a retain-set-free unlearning method for large language models that selectively demotes memorized tokens using self-distillation, achieving a better balance between forgetting specific content and maintaining overall utility.

Contribution

SHRED introduces a novel retain-set-free unlearning approach that leverages token-level information and self-distillation to efficiently forget memorized content without extra data dependencies.

Findings

01

SHRED outperforms retain-set-dependent methods on standard benchmarks.

02

It achieves a superior trade-off between forget efficacy and model utility.

03

SHRED is robust against relearning and membership-inference attacks.

Abstract

Machine unlearning for large language models (LLMs) aims to selectively remove memorized content such as private data, copyrighted text, or hazardous knowledge, without costly full retraining. Most existing methods require a retain set of curated examples to prevent catastrophic degradation of general model utility, creating an extra data dependency that complicates deployment. We propose SHRED (Self-distillation via High-surprisal-only Retain-set-free Entropy Demotion), a retain-set-free unlearning method built on a key insight: not all tokens within a forget set instance carry memorized information equally. High-information tokens concentrate the model's memorized knowledge, while low-information tokens reflect general language competence. SHRED operates in two stages. (1) Selection: We perform a forward pass on a forget set instance, collect per-token autoregressive probabilities,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.