SoK: Unlearnability and Unlearning for Model Dememorization
Mengying Zhang, Derui Wang, Ruoxi Sun, Xiaoyu Xia, Shuang Hao, Minhui Xue

TL;DR
This paper systematically analyzes model dememorization techniques like unlearnability and unlearning, highlighting their vulnerabilities, interactions, and providing the first theoretical guarantees for their effectiveness.
Contribution
It offers a unified taxonomy, empirical evaluation, and theoretical guarantees, advancing understanding of model dememorization methods.
Findings
Shallow dememorization can lead to false claims of data forgetting.
Input perturbations impact unlearning effectiveness.
Theoretical bounds on dememorization depth are established.
Abstract
Advanced model dememorization methods, including availability poisoning (unlearnability) and machine unlearning, are emerging as key safeguards against data misuse in machine learning (ML). At the training stage, unlearnability embeds imperceptible perturbations into data before release to reduce learnability. At the post-training stage, unlearning removes previously acquired information from models to prevent unauthorized disclosure or use. While both defenses aim to preserve the right to withhold knowledge, their vulnerabilities and shared foundations remain unclear. Specifically, both unlearnability and unlearning suffer from issues such as shallow dememorization, leading to falsely claimed data learnability reduction or forgetting in the presence of weight perturbations. Moreover, input perturbations may affect the effectiveness of downstream unlearning, while unlearning may…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
