IDER: IDempotent Experience Replay for Reliable Continual Learning
Zhanwang Liu, Yuting Li, Haoyuan Gao, Yexin Li, Linghe Kong, Lichao Sun, Weiran Huang

TL;DR
This paper introduces IDER, a novel continual learning method that enhances model reliability and reduces forgetting by enforcing idempotence, with minimal computational overhead and compatibility with existing replay techniques.
Contribution
The paper proposes a new idempotent experience replay approach that improves reliability and accuracy in continual learning without high computational costs.
Findings
Consistently improves prediction reliability across benchmarks.
Reduces catastrophic forgetting effectively.
Seamlessly integrates with existing CL methods.
Abstract
Catastrophic forgetting, the tendency of neural networks to forget previously learned knowledge when learning new tasks, has been a major challenge in continual learning (CL). To tackle this challenge, CL methods have been proposed and shown to reduce forgetting. Furthermore, CL models deployed in mission-critical settings can benefit from uncertainty awareness by calibrating their predictions to reliably assess their confidences. However, existing uncertainty-aware continual learning methods suffer from high computational overhead and incompatibility with mainstream replay methods. To address this, we propose idempotent experience replay (IDER), a novel approach based on the idempotent property where repeated function applications yield the same output. Specifically, we first adapt the training loss to make model idempotent on current data streams. In addition, we introduce an…
Peer Reviews
Decision·ICLR 2026 Poster
The paper is well written and organized. Adaptation of the idempotence loss function to the continual learning setting is creative and nuanced. The experimental results are promising.
I think giving more intuition about the loss function would be helpful for the reader. For example, in lines 250-252, it is stated that minimizing equation 5 biases $f_t$ towards the wrong label (even though with probability $1-P$ that objective would be minimized in eq 5), but why would $f_{t-1}$ not have the same problems? Expanding that explanation would be great.
• The manuscript introduce idempotence, a mathematical property, to tackle catastrophic forgetting and poor model calibration in continual learning. • The proposed IDER is a lightweight framework and functions as a plug-and-play module for performance gains
• The paper primarily relies on intuition and empirical success to introduce idempotence. It lacks a rigorous theoretical analysis or hypothesis for why enforcing output stability should directly mitigate catastrophic forgetting at a fundamental level. • The empirical validation is comprehensive on CIFAR-10, CIFAR-100, and Tiny-ImageNet. However, to firmly establish the method's practicality and generalizability, evaluation on a large-scale dataset, e.g., ImageNet-1K. • The hyperparameter sensit
The idea of using idempotent property to mitigate issues of poor calibration and recency bias is straightforward and intuitive. The paper is well-written and easy to follow. The proposed method is shown to be effective.
As stated in Lines 71–74, the paper claims a strong correlation between the idempotence distance and prediction error. However, it remains unclear whether this relationship has been formally analysed. Could the authors provide empirical evidence or theoretical justification for this claim? To enable idempotence with respect to the second input, the proposed method divides the backbone into two parts. What principles guide this division? Does choosing different partition points (e.g., splitting
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
