Representation-Aware Unlearning via Activation Signatures: From Suppression to Knowledge-Signature Erasure

Syed Naveed Mahmood; Md. Rezaur Rahman Bhuiyan; Tasfia Zaman; Jareen Tasneem Khondaker; Md. Sameer Sakib; K. M. Shadman Wadith; Nazia Tasnim; Farig Sadeque

arXiv:2601.10566·cs.CL·March 19, 2026

Representation-Aware Unlearning via Activation Signatures: From Suppression to Knowledge-Signature Erasure

Syed Naveed Mahmood, Md. Rezaur Rahman Bhuiyan, Tasfia Zaman, Jareen Tasneem Khondaker, Md. Sameer Sakib, K. M. Shadman Wadith, Nazia Tasnim, Farig Sadeque

PDF

Open Access

TL;DR

This paper introduces KIF, a representation-aware framework for true knowledge erasure in large language models, effectively distinguishing genuine unlearning from surface-level suppression while maintaining model utility.

Contribution

The paper presents a novel architecture that targets internal activation signatures for durable unlearning, overcoming the limitations of prior surface-level suppression methods.

Findings

01

KIF achieves near-oracle erasure with minimal utility loss.

02

Standard models show scale-independent true erasure (<3% utility drift).

03

Reasoning-prior models exhibit architectural divergence affecting unlearning.

Abstract

Selective knowledge erasure from LLMs is critical for GDPR compliance and model safety, yet current unlearning methods conflate behavioral suppression with true knowledge removal, allowing latent capabilities to persist beneath surface-level refusals. In this work, we address this challenge by introducing Knowledge Immunization Framework (KIF), a representation-aware architecture that distinguishes genuine erasure from obfuscation by targeting internal activation signatures rather than surface outputs. Our approach combines dynamic suppression of subject-specific representations with parameter-efficient adaptation, enabling durable unlearning without full model retraining. KIF achieves near-oracle erasure (FQ approx 0.99 vs. 1.00) while preserving utility at oracle levels (MU = 0.62), effectively breaking the stability-erasure tradeoff that has constrained all prior work. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks