Continual Reinforcement Learning by Planning with Online World Models

Zichen Liu; Guoji Fu; Chao Du; Wee Sun Lee; Min Lin

arXiv:2507.09177·cs.LG·July 15, 2025

Continual Reinforcement Learning by Planning with Online World Models

Zichen Liu, Guoji Fu, Chao Du, Wee Sun Lee, Min Lin

PDF

Open Access

TL;DR

This paper introduces a continual reinforcement learning approach using online world models that prevent forgetting and enable an agent to learn multiple tasks sequentially, demonstrating superior performance over existing methods.

Contribution

The paper proposes a novel online world model with a Follow-The-Leader approach and a planning method that effectively mitigates catastrophic forgetting in CRL.

Findings

01

The online world model has a proven regret bound of $ ext{O}(\sqrt{K^2D\log(T)})$.

02

The proposed method outperforms deep world models with various continual learning techniques.

03

Empirical results on Continual Bench show effective lifelong learning without forgetting.

Abstract

Continual reinforcement learning (CRL) refers to a naturalistic setting where an agent needs to endlessly evolve, by trial and error, to solve multiple tasks that are presented sequentially. One of the largest obstacles to CRL is that the agent may forget how to solve previous tasks when learning a new task, known as catastrophic forgetting. In this paper, we propose to address this challenge by planning with online world models. Specifically, we learn a Follow-The-Leader shallow model online to capture the world dynamics, in which we plan using model predictive control to solve a set of tasks specified by any reward functions. The online world model is immune to forgetting by construction with a proven regret bound of $O (K^{2} D lo g (T))$ under mild assumptions. The planner searches actions solely based on the latest online model, thus forming a FTL Online Agent (OA) that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics