TL;DR
ZeroUnlearn is a novel few-shot framework for efficiently unlearning sensitive information in large language models by re-mapping knowledge through model editing, outperforming existing methods.
Contribution
It introduces a precise, efficient model editing approach for knowledge unlearning that preserves model utility and extends to multi-sample scenarios.
Findings
Outperforms existing unlearning baselines in experiments.
Enforces representational orthogonality for targeted unlearning.
Maintains overall model utility after unlearning.
Abstract
Large language models inevitably retain sensitive information, defined as inputs that may induce harmful generations, due to training on massive web corpora, raising concerns for privacy and safety. Existing machine unlearning methods primarily rely on retraining or aggressive fine-tuning, which are either computationally expensive or prone to degrading related knowledge and overall model utility. In this work, we reformulate machine unlearning as a precise knowledge re-mapping problem via model editing. We propose ZeroUnlearn, a few-shot unlearning framework. It overwrites sensitive inputs by mapping them to a neutral target state and removing their original representations. ZeroUnlearn enforces representational orthogonality through a multiplicative parameter update with a closed-form solution, enabling efficient and targeted unlearning. We further extend ZeroUnlearn to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
