Statistical Unlearning of Distributions: A Hypothesis Testing Approach

Aaradhya Pandey; Sanjeev Kulkarni

arXiv:2605.16645·math.ST·May 19, 2026

Statistical Unlearning of Distributions: A Hypothesis Testing Approach

Aaradhya Pandey, Sanjeev Kulkarni

PDF

TL;DR

This paper introduces a statistical framework for distributional unlearning, enabling the removal of entire data domains from machine learning models while maintaining performance on desired data, with theoretical guarantees and analysis.

Contribution

It formalizes distributional unlearning using hypothesis testing, characterizes the fundamental limits, and analyzes behavior across multiple distribution families and composition scenarios.

Findings

01

Characterized the allowable data distribution region for unlearning.

02

Proved composition rules for multimodal unwanted domains.

03

Provided finite sample guarantees and identified an information-computation gap.

Abstract

Machine learning systems increasingly face requirements to forget not only individual data points, but entire domains of information, such as toxic language, copyrighted corpora, or demographic biases. This raises a fundamental dilemma of statistical-computational tradeoffs: removing all samples from an unwanted domain may be computationally prohibitive, while randomly removing a subset may not provide distribution-level statistical guarantees. We propose a statistical framework for distributional unlearning, in which domains are modeled as probability distributions, and the goal is to remove a carefully chosen subset of samples that reduces the effect of an unwanted distribution while preserving performance on a desired one. We formalize this using a hypothesis test of the edited data with the desired and unwanted domains, leading to an interpretable and robust criterion for selecting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.