A Certified Unlearning Approach without Access to Source Data
Umit Yigit Basaran, Sk Miraj Ahmed, Amit Roy-Chowdhury, Basak Guler

TL;DR
This paper introduces a certified unlearning method that removes data from trained models without needing access to the original data, using surrogate datasets and statistical distance measures.
Contribution
It presents a novel framework for data unlearning that operates without source data, providing theoretical guarantees and practical noise calibration techniques.
Findings
Effective data removal demonstrated on synthetic datasets
Maintains model utility while ensuring privacy guarantees
Theoretical bounds support practical implementation
Abstract
With the growing adoption of data privacy regulations, the ability to erase private or copyrighted information from trained models has become a crucial requirement. Traditional unlearning methods often assume access to the complete training dataset, which is unrealistic in scenarios where the source data is no longer available. To address this challenge, we propose a certified unlearning framework that enables effective data removal \final{without access to the original training data samples}. Our approach utilizes a surrogate dataset that approximates the statistical properties of the source data, allowing for controlled noise scaling based on the statistical distance between the two. \updated{While our theoretical guarantees assume knowledge of the exact statistical distance, practical implementations typically approximate this distance, resulting in potentially weaker but still…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI
