Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution
Saisab Sadhu, Pratinav Seth, Vinay Kumar Sankarapu

TL;DR
This paper introduces MANSU, a novel unlearning method that ensures forgetting persists after quantization by combining circuit attribution, null-space projection, and quantization-aware bounds, addressing systematic failures in existing approaches.
Contribution
The paper proposes MANSU, a new unlearning technique that guarantees persistent forgetting post-quantization, and introduces Circuit Attribution Divergence as a verification metric for structural erasure.
Findings
MANSU outperforms gradient-based methods under quantization.
Existing methods fail to maintain forgetting after 4-bit quantization.
MANSU achieves meaningful forgetting and structural erasure across models.
Abstract
Standard unlearning evaluations measure behavioral suppression in full precision, immediately after training, despite every deployed language model being quantized first. Recent work has shown that 4-bit post-training quantization can reverse machine unlearning; we show this is not a tuning artefact but a systematic dual failure: gradient-based methods that achieve meaningful forgetting lose it under compression, while methods that survive quantization barely change the model. Both failures trace to the same root cause: across all baselines, per-parameter updates lie 47-828x below the NF4 quantization bin width; updates diffused across billions of parameters cannot clear quantization bin boundaries, a consequence we formalize as a sparsity-permanence tradeoff. We present MANSU (Mechanistic-Aligned Null-Space Unlearning), which resolves both modes by combining causal circuit attribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
