TL;DR
This paper introduces a novel streaming unlearning algorithm that efficiently removes knowledge of specific data in models as data removal requests occur over time, formalizing it as a distribution shift problem and providing theoretical guarantees.
Contribution
The paper proposes a new streaming unlearning paradigm with a distribution shift formulation and a novel algorithm that achieves efficient forgetting without access to original data, supported by theoretical analysis.
Findings
The algorithm achieves an $O(\sqrt{T} + V_T)$ regret bound.
Experimental results validate the effectiveness across various models and datasets.
The method operates efficiently in streaming data removal scenarios.
Abstract
Machine unlearning aims to remove knowledge of the specific training data in a well-trained model. Currently, machine unlearning methods typically handle all forgetting data in a single batch, removing the corresponding knowledge all at once upon request. However, in practical scenarios, requests for data removal often arise in a streaming manner rather than in a single batch, leading to reduced efficiency and effectiveness in existing methods. Such challenges of streaming forgetting have not been the focus of much research. In this paper, to address the challenges of performance maintenance, efficiency, and data access brought about by streaming unlearning requests, we introduce a streaming unlearning paradigm, formalizing the unlearning as a distribution shift problem. We then estimate the altered distribution and propose a novel streaming unlearning algorithm to achieve efficient…
Peer Reviews
Decision·Submitted to ICLR 2025
1. This paper is well written and easy to follow. The presentation and the organization of the paper is good. 2. This paper tackles an important problem, which is to conduct unlearning in a streaming manner. This problem has not been well studied in the unlearning literature. 3. The idea of viewing the unlearning as distribution shift problem has not been well explored before. The approach is relatively novel compared to existing works. 4. The paper also provides theoretical insights into the
1. Technical contributions: In Line 65, the authors mention that only two studies (Zhao et al., 2024; Li et al., 2021) have directly tackled the problem of stream data forgetting. However, it is not clear what are the new technical novelties of this work compared to previous works. It would be great if the authors can better illustrate their technical contributions and advantages in the revised paper. 2. Baselines: During experiments, it is not clear why the above two works (Zhao et al., 2024;
- This paper considers a practical setting for machine unlearning, i.e., the streaming setting where new data may continue arrive or be generated during the unlearning process. This new settings bring many new challenges compared to the widely-studied static/batched settings. This paper addressed these new challenges via proposing a new SAFE algorithm. - The performance of SAFE algorithm was (a) analyzed with a regret upper bound guarantee; and (b) numerically evaluated using several datasets a
- The performance analysis of SAFE is relatively weak or may be very straightforward given the existing literature. For instance, the bound on the error of the t-th rounds is simply bounded by $O(\sqrt{T})$, which is straightforward and should not be a tight bound. - As noted by the authors, the accumulated regret is also not tight and compared to existing results. - Despite that MNIST, FASHION, and CIFAR-10 have been widely used, it may be more interesting to consider more complex datasets su
S1. The streaming setup is interesting and a novel way to think about the problem that seems useful. I think this paper would be interesting to both the “unlearning” community as well as the online learning community. S2. The theoretical derivations have clear assumptions that do not seem excessive. Bounded gradients (Lipschitz) is a mild assumption in general. The theoretical derivations are correct, at least from my checking. (I do have some minor concerns listed later.) S3. Some evidence fo
W1. While the theoretical grounding is excellent, the experiments are too small scale to be convincing and the statistical analysis is poor or missing. The use of only small benchmarks raises doubts as to whether the relative improvement to baselines is generalizable beyond the small image datasets. Can the authors explain the choice of datasets? Why not use other datasets that are larger (such as ImageNet or variations thereof)? Why focus entirely on image datasets? Since the algorithm operates
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
