Coded Machine Unlearning
Nasser Aldaghri, Hessam Mahdavifar, Ahmad Beirami

TL;DR
This paper introduces a coded machine unlearning method using linear encoders to efficiently remove data traces from models, balancing unlearning cost and model performance.
Contribution
It proposes a novel coded learning protocol that enables perfect unlearning with improved trade-offs over traditional ensemble methods.
Findings
Coded unlearning achieves better performance-cost trade-offs.
The protocol satisfies perfect unlearning criteria.
Experimental results outperform uncoded baselines.
Abstract
There are applications that may require removing the trace of a sample from the system, e.g., a user requests their data to be deleted, or corrupted data is discovered. Simply removing a sample from storage units does not necessarily remove its entire trace since downstream machine learning models may store some information about the samples used to train them. A sample can be perfectly unlearned if we retrain all models that used it from scratch with that sample removed from their training dataset. When multiple such unlearning requests are expected to be served, unlearning by retraining becomes prohibitively expensive. Ensemble learning enables the training data to be split into smaller disjoint shards that are assigned to non-communicating weak learners. Each shard is used to produce a weak model. These models are then aggregated to produce the final central model. This setup…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
