# Improvements to dark experience replay and reservoir sampling for better balance between consolidation and plasticity

**Authors:** Taisuke Kobayashi

PMC · DOI: 10.3389/frai.2026.1649239 · 2026-02-19

## TL;DR

This paper improves dark experience replay and reservoir sampling to better balance memory retention and adaptability in continual learning.

## Contribution

The study introduces automatic weight adaptation and strategies to handle inconsistent data in DER and RS.

## Key findings

- The proposed improvements achieved steady performance gains across multiple learning tasks.
- Blocking inconsistent data and correcting past outputs reduced negative impacts of distribution shifts.
- Stratified buffers and generalized acceptance probability enhanced reservoir sampling efficiency.

## Abstract

Continual learning is one of the most essential abilities for autonomous agents, which can incrementally learn daily-life skills even with limited computer resources. To achieve this goal, a simple yet powerful method called dark experience replay (DER) was recently proposed. DER mitigates catastrophic forgetting, where the skills acquired in the past are unintentionally forgotten when learning new skills, by stochastically storing streaming data in a reservoir sampling (RS) buffer and relearning them or retaining their past outputs. However, because DER considers multiple objectives, it does not function properly without appropriate weighting for each problem. In addition, the ability to retain past outputs inhibits learning if past outputs are inconsistent owing to distribution shifts or other effects. This is because of the trade-off between memory consolidation and plasticity. The trade-off is hidden even in the RS buffer, which gradually stops storing new data for new skills as data are continuously passed to it. To alleviate this trade-off and achieve a better balance, this study proposes improvement strategies for each DER and RS. Specifically, DER is improved by the automatic adaptation of weights, blocking of replaying inconsistent data, and correction of past outputs. RS is also improved with the generalization of acceptance probability, stratification of multiple buffers, and intentional omission of inconsistent data. These improvements were verified using multiple benchmarks including regression, classification, and reinforcement learning problems. Consequently, the proposed methods achieved a steady improvement in learning performance by balancing memory consolidation and plasticity.

## Full-text entities

- **Genes:** CREBBP (CREB binding lysine acetyltransferase) [NCBI Gene 1387] {aka CBP, KAT3A, MKHK1, RSTS, RSTS1}
- **Diseases:** CL (MESH:D007859)
- **Chemicals:** FIFO (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12961966/full.md

---
Source: https://tomesphere.com/paper/PMC12961966