Stable Continual Reinforcement Learning via Diffusion-based Trajectory   Replay

Feng Chen; Fuguang Han; Cong Guan; Lei Yuan; Zhilong Zhang; Yang Yu,; Zongzhang Zhang

arXiv:2411.10809·cs.LG·November 19, 2024

Stable Continual Reinforcement Learning via Diffusion-based Trajectory Replay

Feng Chen, Fuguang Han, Cong Guan, Lei Yuan, Zhilong Zhang, Yang Yu,, Zongzhang Zhang

PDF

Open Access

TL;DR

This paper introduces DISTR, a diffusion-based trajectory replay method for continual reinforcement learning, which effectively mitigates catastrophic forgetting by memorizing high-return trajectories and prioritizing pivotal tasks, outperforming existing methods.

Contribution

The paper proposes a novel diffusion model-based replay mechanism for continual RL, addressing generative replay limitations and enhancing stability and plasticity in learning multiple tasks.

Findings

01

DISTR outperforms existing continual RL baselines on the Continual World benchmark.

02

The diffusion-based replay effectively preserves task knowledge and improves success rates.

03

Prioritization of pivotal tasks enhances learning efficiency and stability.

Abstract

Given the inherent non-stationarity prevalent in real-world applications, continual Reinforcement Learning (RL) aims to equip the agent with the capability to address a series of sequentially presented decision-making tasks. Within this problem setting, a pivotal challenge revolves around \textit{catastrophic forgetting} issue, wherein the agent is prone to effortlessly erode the decisional knowledge associated with past encountered tasks when learning the new one. In recent progresses, the \textit{generative replay} methods have showcased substantial potential by employing generative models to replay data distribution of past tasks. Compared to storing the data from past tasks directly, this category of methods circumvents the growing storage overhead and possible data privacy concerns. However, constrained by the expressive capacity of generative models, existing \textit{generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic control and management

MethodsDiffusion