Distributional Machine Unlearning via Selective Data Removal
Youssef Allouah, Rachid Guerraoui, Sanmi Koyejo

TL;DR
This paper introduces a distributional unlearning framework that efficiently removes unwanted data domains from machine learning models by selecting small, influential data subsets, significantly reducing the amount of data needing removal while maintaining model performance.
Contribution
It formalizes distributional unlearning with a Pareto optimality framework, derives theoretical bounds for Gaussian models, and proposes a distance-based selection algorithm that improves sample efficiency over random removal.
Findings
Requires 15-82% less data removal than full deletion
Quadratically more sample-efficient than random removal
Effective across synthetic, text, and image datasets
Abstract
Machine learning systems increasingly face requirements to remove entire domains of information--such as toxic language or biases--rather than individual user data. This task presents a dilemma: full removal of the unwanted domain data is computationally expensive, while random partial removal is statistically inefficient. We find that a domain's statistical influence is often concentrated in a small subset of its data samples, suggesting a path between ineffective partial removal and unnecessary complete removal. We formalize this as distributional unlearning: a framework to select a small subset that balances forgetting an unwanted distribution while preserving a desired one. Using Kullback-Leibler divergence constraints, we derive the exact removal-preservation Pareto frontier for Gaussian distributions and prove that models trained on the edited data achieve corresponding log-loss…
Peer Reviews
Decision·ICLR 2026 Poster
- The paper does a good job of defining and motivating the proposed "distribution unlearning" framework. - The framework is built with a strong theoretical foundation: - Using KL divergence to define forget and retain objectives is simple yet effective. - The proposed selective removal algorithm is quite intuitive and is derived directly from the theoretical analysis. The authors also analyze theoretically how the selective removal algorithm is a strategy to maximizing the KL divergnce of t
- The framework is built on the assumption that the unwanted and retained data sets are already known. While the authors assert that this could be done via some upstream process (e.g., keyword filtering), there could be other scenarios where such techniques do not work. - I'm not entirely sure if the core motivation, that a domain's influence is concentrated in a small subset holds in the experimental results. For example, in CIFAR10 the removal is only observed after 50% deletion and in Jigsaw
- The paper is very well written, with clear exposition and excellent presentation. The theoretical claims are sound. - The introduction of distributional machine unlearning represents a novel and interesting contribution, offering a perspective on how data deletion can be studied from a statistical standpoint. - The inclusion of coreset-based methods as an additional baseline is appropriate and well-motivated.
- The theoretical results are derived for data following two gaussian distributions (strong assumption) for two simple removal strategies. While these allow for closed-form analysis, they are not particularly practical for real-world unlearning. - The use of Kullback–Leibler (KL) divergence between data distributions as the central measure does not directly capture the notion of “forgetting” at the model level. KL divergence quantifies average differences between data distributions, not how muc
The paper's primary strength is its elegant formalization of domain unlearning as a distributional problem. By defining the objectives using KL divergence and characterizing the optimal trade-off via a Pareto frontier, the authors provide a rigorous, data-centric foundation for a problem that is often treated with ad-hoc heuristics. The connection established in Proposition 2 between the data-level KL objectives and the downstream model's expected log-loss is particularly powerful, as it provide
1. Novelty: The core algorithm—removing points from one set based on their distance to the mean of another—is mechanically very simple. The novelty does not lie in a complex algorithmic contribution but rather in the application of this simple heuristic to the unlearning problem and, most importantly, the new theoretical framework that justifies it. While the framework is novel, the method itself could be seen as an application of standard outlier-detection or data-cleaning principles. 2. Experi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Mineral Processing and Grinding
