A Certified Unlearning Approach without Access to Source Data

Umit Yigit Basaran; Sk Miraj Ahmed; Amit Roy-Chowdhury; Basak Guler

arXiv:2506.06486·cs.LG·December 22, 2025

A Certified Unlearning Approach without Access to Source Data

Umit Yigit Basaran, Sk Miraj Ahmed, Amit Roy-Chowdhury, Basak Guler

PDF

Open Access 1 Video

TL;DR

This paper introduces a certified unlearning method that removes data from trained models without needing access to the original data, using surrogate datasets and statistical distance measures.

Contribution

It presents a novel framework for data unlearning that operates without source data, providing theoretical guarantees and practical noise calibration techniques.

Findings

01

Effective data removal demonstrated on synthetic datasets

02

Maintains model utility while ensuring privacy guarantees

03

Theoretical bounds support practical implementation

Abstract

With the growing adoption of data privacy regulations, the ability to erase private or copyrighted information from trained models has become a crucial requirement. Traditional unlearning methods often assume access to the complete training dataset, which is unrealistic in scenarios where the source data is no longer available. To address this challenge, we propose a certified unlearning framework that enables effective data removal \final{without access to the original training data samples}. Our approach utilizes a surrogate dataset that approximates the statistical properties of the source data, allowing for controlled noise scaling based on the statistical distance between the two. \updated{While our theoretical guarantees assume knowledge of the exact statistical distance, practical implementations typically approximate this distance, resulting in potentially weaker but still…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Certified Unlearning Approach without Access to Source Data· slideslive

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI