TL;DR
This paper investigates how memorization affects continual learning, revealing that high-memorization samples are forgotten faster but are essential for optimal performance, especially with larger memory buffers.
Contribution
The study clarifies the role of memorization in incremental learning and introduces a memorization proxy to improve buffer sampling strategies.
Findings
High-memorization samples are forgotten faster.
Memorization is necessary for peak performance.
Including high-memorization samples benefits large buffer training.
Abstract
Memorization impacts the performance of deep learning algorithms. Prior works have studied memorization primarily in the context of generalization and privacy. This work studies the memorization effect on incremental learning scenarios. Forgetting prevention and memorization seem similar. However, one should discuss their differences. We designed extensive experiments to evaluate the impact of memorization on continual learning. We clarified that learning examples with high memorization scores are forgotten faster than regular samples. Our findings also indicated that memorization is necessary to achieve the highest performance. However, at low memory regimes, forgetting regular samples is more important. We showed that the importance of a high-memorization score sample rises with an increase in the buffer size. We introduced a memorization proxy and employed it in the buffer policy…
Peer Reviews
Decision·Submitted to ICLR 2026
* This works is one of the few existing works studying memorization in continual learning, which I found extremely interesting and important, and potentially significant. In this sense, the work would be original and significant. However, in my view the paper suffers from fundamental flaws that invalidate its primary conclusions (see weaknesses).
* While the paper's title suggests a board study of memorization role in continual learning, the focus is only on class-incremental continual learning, so the stated findings at best could only be applicable to this narrow problem. Thus, the study conducted in Section 3.2 on the role of varying number of classes could be complete irrelevant to board continual learning setups. * In a similar vein, the paper calculates memorization scores in an offline, stationary setting (i.e., training on the f
1. Provides a novel and focused investigation of memorization in continual learning, a topic rarely examined beyond generalization or privacy contexts. 2. Clearly distinguishes memorization from forgetting prevention, offering conceptual clarity that helps refine the theoretical framing of continual learning. 3. Includes comprehensive experiments across multiple datasets (CIFAR10, CIFAR100, Tiny ImageNet), architectures (ResNet18/34/50), and memory settings, ensuring robust and generalizable f
1. The paper is at times difficult to follow, and several descriptions lack precision. For example, in Figure 5, it is unclear what the training iterations on the y-axis represent. Upon checking the appendix, it appears that the y-axis corresponds to the proposed training-iteration–based memorization proxy, but this is not clearly stated in the main text. If this interpretation is incorrect, please clarify what is actually plotted. The manuscript would benefit from improved figure captions and c
1. Through well-designed experiments on standard continual learning benchmarks (such as the CIFAR100 split), the authors reveal a pronounced correlation between memory scores, buffer size, and forgetting behavior. 2. The explicit study of how memorized examples behave under continual learning is a valuable contribution. 3. I think the paper is well-written and generally easy to follow.
1. Despite robust experimental validation, the paper offers a limited theoretical explanation for why highly memorized samples are more prone to forgetting in continual learning. 2. This study argues that memory should be considered in incremental training, but it does not explore how much additional computational or memory overhead would be introduced by calculating memory scores or implementing proxy strategies in practical applications.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
