Self-Composing Policies for Scalable Continual Reinforcement Learning

Mikel Malag\'on; Josu Ceberio; Jose A. Lozano

arXiv:2506.14811·cs.LG·June 19, 2025

Self-Composing Policies for Scalable Continual Reinforcement Learning

Mikel Malag\'on, Josu Ceberio, Jose A. Lozano

PDF

Open Access

TL;DR

This paper presents a scalable, growable neural network architecture for continual reinforcement learning that prevents forgetting, accelerates learning, and maintains linear growth in parameters, outperforming existing methods.

Contribution

Introduces a modular, growable neural network architecture that scales linearly with tasks and enhances knowledge transfer in continual reinforcement learning.

Findings

01

Achieves better performance than alternative methods.

02

Prevents catastrophic forgetting and interference.

03

Parameter growth is linear with the number of tasks.

Abstract

This work introduces a growable and modular neural network architecture that naturally avoids catastrophic forgetting and interference in continual reinforcement learning. The structure of each module allows the selective combination of previous policies along with its internal policy, accelerating the learning process on the current task. Unlike previous growing neural network approaches, we show that the number of parameters of the proposed approach grows linearly with respect to the number of tasks, and does not sacrifice plasticity to scale. Experiments conducted in benchmark continuous control and visual problems reveal that the proposed approach achieves greater knowledge transfer and performance than alternative methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence