Replay Can Provably Increase Forgetting

Yasaman Mahdaviyeh; James Lucas; Mengye Ren; Andreas S. Tolias; Richard Zemel; Toniann Pitassi

arXiv:2506.04377·cs.LG·June 6, 2025

Replay Can Provably Increase Forgetting

Yasaman Mahdaviyeh, James Lucas, Mengye Ren, Andreas S. Tolias, Richard Zemel, Toniann Pitassi

PDF

Open Access 4 Reviews

TL;DR

This paper provides a theoretical and empirical analysis of sample replay in continual learning, revealing that replay can sometimes increase forgetting and that its effectiveness depends on task relationships and sample selection.

Contribution

The work offers the first theoretical analysis showing that replay can be harmful and non-monotonic, and demonstrates this phenomenon in both linear models and neural networks.

Findings

01

Replay can increase forgetting even with more samples.

02

The effectiveness of replay depends on task relationships and sample selection.

03

Harmful replay behavior is observed in neural networks, not just linear models.

Abstract

Continual learning seeks to enable machine learning systems to solve an increasing corpus of tasks sequentially. A critical challenge for continual learning is forgetting, where the performance on previously learned tasks decreases as new tasks are introduced. One of the commonly used techniques to mitigate forgetting, sample replay, has been shown empirically to reduce forgetting by retaining some examples from old tasks and including them in new training episodes. In this work, we provide a theoretical analysis of sample replay in an over-parameterized continual linear regression setting, where each task is given by a linear subspace and with enough replay samples, one would be able to eliminate forgetting. Our analysis focuses on sample replay and highlights the role of the replayed samples and the relationship between task subspaces. Surprisingly, we find that, even in a noiseless…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 4

Strengths

1. Theory on replay-based CL is very limited and this paper provides an attempt in this direction to understand when replaying one examplar can increase forgetting. 2. The theoretical results are verified in the experiments. 3. The presentation is clear.

Weaknesses

1. The assumptions are too strong, particularly assumption 2.2. It is hard to justify how this can be true in practice. The assumptions 3.1 and 3.3 are also restrictive. For overparameterized linear models, assuming a constant sample norm is not widely seen. Given the fact that investigating linear models is already a restrictive setup, making these additional assumptions further weakens the importance of the theoretical results. 2. The definition of forgetting in equation 3 is based the traini

Reviewer 02Rating 3Confidence 4

Strengths

- The paper is well written and easy to read. - The paper tackles an important theoretical question, that is the problem of how to analyze replay-based continual learning.

Weaknesses

The paper makes a very strong statement: "Replay can provably increase forgetting". Results on replay may be the most robust piece of evidence in the whole continual learning literature. Therefore, I expect that claiming that replay provably increases forgetting requires exceptional evidence. I would argue that the results of the paper are more a property of the extremely limited setting than a general property of replay-based methods. - (line 65) the paper argues that it is counter-intuitive th

Reviewer 03Rating 3Confidence 4

Strengths

- Theoretical analyses in the area of CL are scarce. The authors identified a gap between previous work and current methods, which can help to understand the limitations and strengths of current memory-based methods, as well as help to understand some empirical results that may be unintuitive. - The work has a clear motivation, and the authors identified a need to increase the theoretical understanding in this research area.

Weaknesses

- I agree with some of the authors' conclusions, but they ignore an essential body of work in CL that focused on the selection of items stored in memory. Although many of these papers focus on empirical studies, they reach similar and, in some cases, more robust conclusions than those found in this paper. - The authors' analysis is based on simple models and scenarios, which can often be difficult to extrapolate to more complex scenarios. If empirical results are presented, I recommend also

Reviewer 04Rating 3Confidence 4

Strengths

They studied the negative impact of replay in CL, which is indeed a surprising topic. Their theoretical results explained the reason of the negative impact, which is further verified in experiments.

Weaknesses

Both theoretical(Theorem 3.6) and experimental results focus on T=2, which is not general enough. Theorem 3.2 is an extreme example, which makes the result less surprising. Furthermore, the experimental parts should at least present what will happen when T is large than 2.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection

MethodsStochastic Gradient Descent · Linear Regression