Forget What Matters, Keep the Rest: Selective Unlearning of Informative Tokens
Seunghee Koh, Sunghyun Baek, Youngdong Kim, Junmo Kim

TL;DR
This paper introduces Entropy-guided Token Weighting (ETW), a novel method for selective unlearning in large language models that uses entropy to identify and prioritize informative tokens, improving unlearning effectiveness and utility preservation.
Contribution
The paper proposes a new entropy-based token weighting method for unlearning in LLMs that overcomes limitations of previous approaches relying on external tools or confidence scores.
Findings
ETW effectively identifies informative tokens using entropy.
ETW achieves better unlearning and utility preservation than existing methods.
Informative tokens tend to have higher entropy, structural tokens lower entropy.
Abstract
Unlearning in large language models (LLMs) has emerged as a promising safeguard against adversarial behaviors. When the forgetting loss is applied uniformly without considering token-level semantic importance, model utility can be unnecessarily degraded. Recent studies have explored token-wise loss regularizers that prioritize informative tokens, but largely rely on ground-truth confidence or external linguistic parsers, which limits their ability to capture contextual information or the model's overall predictive state. Intuitively, function words like "the" primarily serve syntactic roles and are highly predictable with little ambiguity, but informative words admit multiple plausible alternatives with greater uncertainty. Based on this intuition, we propose Entropy-guided Token Weighting (ETW), a token-level unlearning regularizer that uses entropy of the predictive distribution as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
