Statistical Unlearning of Distributions: A Hypothesis Testing Approach
Aaradhya Pandey, Sanjeev Kulkarni

TL;DR
This paper introduces a statistical framework for distributional unlearning, enabling the removal of entire data domains from machine learning models while maintaining performance on desired data, with theoretical guarantees and analysis.
Contribution
It formalizes distributional unlearning using hypothesis testing, characterizes the fundamental limits, and analyzes behavior across multiple distribution families and composition scenarios.
Findings
Characterized the allowable data distribution region for unlearning.
Proved composition rules for multimodal unwanted domains.
Provided finite sample guarantees and identified an information-computation gap.
Abstract
Machine learning systems increasingly face requirements to forget not only individual data points, but entire domains of information, such as toxic language, copyrighted corpora, or demographic biases. This raises a fundamental dilemma of statistical-computational tradeoffs: removing all samples from an unwanted domain may be computationally prohibitive, while randomly removing a subset may not provide distribution-level statistical guarantees. We propose a statistical framework for distributional unlearning, in which domains are modeled as probability distributions, and the goal is to remove a carefully chosen subset of samples that reduces the effect of an unwanted distribution while preserving performance on a desired one. We formalize this using a hypothesis test of the edited data with the desired and unwanted domains, leading to an interpretable and robust criterion for selecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
