Don't Forget the Critic: Value-Based Data Rehearsal for Multi-Cyclic Continual Reinforcement Learning

Benjamin Poole; Andrew Quinn; Li Yang; Minwoo Lee

arXiv:2605.22454·cs.LG·May 22, 2026

Don't Forget the Critic: Value-Based Data Rehearsal for Multi-Cyclic Continual Reinforcement Learning

Benjamin Poole, Andrew Quinn, Li Yang, Minwoo Lee

PDF

TL;DR

This paper introduces Qreg+NWLU, a novel data rehearsal method for Deep Q-Networks in multi-cyclic continual reinforcement learning, improving learning efficiency and reducing forgetting.

Contribution

It extends data rehearsal to value function approximation in multi-cyclic CRL and proposes two simple modifications that enhance performance.

Findings

01

Qreg+NWLU outperforms existing methods in multi-cyclic environments.

02

Continuous data rehearsal improves knowledge retention.

03

Immediate regularization enhances learning efficiency.

Abstract

Data rehearsal has emerged as a leading approach for mitigating catastrophic forgetting in Continual Reinforcement Learning (CRL). However, existing work remains confined to policy gradient frameworks, regularizing only actors due to the performance degradation incurred by critic regularization. This actor-centric approach overlooks the potential of data rehearsal for value function approximation. Moreover, existing evaluations in CRL rarely consider multi-cyclic environments where task sequences repeat, a critical real-world scenario that exacerbates forgetting and plasticity. We investigate data rehearsal for Deep Q-Networks using Q-value regularization in multi-cyclic settings and propose Qreg+NWLU which introduces two simple modifications: (1) continuous data rehearsal that dynamically collects and updates stored Q-values throughout training, and (2) "No-Wait" regularization that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.