Efficient machine unlearning with minimax optimality
Jingyi Xie, Linjun Zhang, Sai Li

TL;DR
This paper introduces a minimax optimal statistical framework for machine unlearning, enabling efficient data removal with theoretical guarantees and minimal data access.
Contribution
It develops a new unlearning method with minimax optimality for squared loss and provides asymptotically valid inference without full retraining.
Findings
ULS achieves near-retraining performance with less data access
The estimation error decomposes into oracle and unlearning costs
The method is validated through numerical experiments and real-data applications
Abstract
There is a growing demand for efficient data removal to comply with regulations like the GDPR and to mitigate the influence of biased or corrupted data. This has motivated the field of machine unlearning, which aims to eliminate the influence of specific data subsets without the cost of full retraining. In this work, we propose a statistical framework for machine unlearning with generic loss functions and establish theoretical guarantees. For squared loss, especially, we develop Unlearning Least Squares (ULS) and establish its minimax optimality for estimating the model parameter of remaining data when only the pre-trained estimator, forget samples, and a small subsample of the remaining data are available. Our results reveal that the estimation error decomposes into an oracle term and an unlearning cost determined by the forget proportion and the forget model bias. We further establish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
