Replay Can Provably Increase Forgetting
Yasaman Mahdaviyeh, James Lucas, Mengye Ren, Andreas S. Tolias, Richard Zemel, Toniann Pitassi

TL;DR
This paper provides a theoretical and empirical analysis of sample replay in continual learning, revealing that replay can sometimes increase forgetting and that its effectiveness depends on task relationships and sample selection.
Contribution
The work offers the first theoretical analysis showing that replay can be harmful and non-monotonic, and demonstrates this phenomenon in both linear models and neural networks.
Findings
Replay can increase forgetting even with more samples.
The effectiveness of replay depends on task relationships and sample selection.
Harmful replay behavior is observed in neural networks, not just linear models.
Abstract
Continual learning seeks to enable machine learning systems to solve an increasing corpus of tasks sequentially. A critical challenge for continual learning is forgetting, where the performance on previously learned tasks decreases as new tasks are introduced. One of the commonly used techniques to mitigate forgetting, sample replay, has been shown empirically to reduce forgetting by retaining some examples from old tasks and including them in new training episodes. In this work, we provide a theoretical analysis of sample replay in an over-parameterized continual linear regression setting, where each task is given by a linear subspace and with enough replay samples, one would be able to eliminate forgetting. Our analysis focuses on sample replay and highlights the role of the replayed samples and the relationship between task subspaces. Surprisingly, we find that, even in a noiseless…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. Theory on replay-based CL is very limited and this paper provides an attempt in this direction to understand when replaying one examplar can increase forgetting. 2. The theoretical results are verified in the experiments. 3. The presentation is clear.
1. The assumptions are too strong, particularly assumption 2.2. It is hard to justify how this can be true in practice. The assumptions 3.1 and 3.3 are also restrictive. For overparameterized linear models, assuming a constant sample norm is not widely seen. Given the fact that investigating linear models is already a restrictive setup, making these additional assumptions further weakens the importance of the theoretical results. 2. The definition of forgetting in equation 3 is based the traini
- The paper is well written and easy to read. - The paper tackles an important theoretical question, that is the problem of how to analyze replay-based continual learning.
The paper makes a very strong statement: "Replay can provably increase forgetting". Results on replay may be the most robust piece of evidence in the whole continual learning literature. Therefore, I expect that claiming that replay provably increases forgetting requires exceptional evidence. I would argue that the results of the paper are more a property of the extremely limited setting than a general property of replay-based methods. - (line 65) the paper argues that it is counter-intuitive th
- Theoretical analyses in the area of CL are scarce. The authors identified a gap between previous work and current methods, which can help to understand the limitations and strengths of current memory-based methods, as well as help to understand some empirical results that may be unintuitive. - The work has a clear motivation, and the authors identified a need to increase the theoretical understanding in this research area.
- I agree with some of the authors' conclusions, but they ignore an essential body of work in CL that focused on the selection of items stored in memory. Although many of these papers focus on empirical studies, they reach similar and, in some cases, more robust conclusions than those found in this paper. - The authors' analysis is based on simple models and scenarios, which can often be difficult to extrapolate to more complex scenarios. If empirical results are presented, I recommend also
They studied the negative impact of replay in CL, which is indeed a surprising topic. Their theoretical results explained the reason of the negative impact, which is further verified in experiments.
Both theoretical(Theorem 3.6) and experimental results focus on T=2, which is not general enough. Theorem 3.2 is an extreme example, which makes the result less surprising. Furthermore, the experimental parts should at least present what will happen when T is large than 2.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection
MethodsStochastic Gradient Descent · Linear Regression
