Continual Policy Distillation from Distributed Reinforcement Learning Teachers
Yuxuan Li, Qijun He, Mingqi Yuan, Wen-Tse Chen, Jeff Schneider, Jiayu Chen

TL;DR
This paper introduces a teacher-student framework for continual reinforcement learning that distills distributed single-task teachers into a generalist model, improving scalability and reducing forgetting across tasks.
Contribution
It proposes a novel decoupled approach combining distributed RL teachers with policy distillation and MoE architecture for scalable continual RL.
Findings
Achieves over 85% of teacher performance on Meta-World.
Limits task-wise forgetting to within 10%.
Enhances stability and plasticity in continual policy learning.
Abstract
Continual Reinforcement Learning (CRL) aims to develop lifelong learning agents to continuously acquire knowledge across diverse tasks while mitigating catastrophic forgetting. This requires efficiently managing the stability-plasticity dilemma and leveraging prior experience to rapidly generalize to novel tasks. While various enhancement strategies for both aspects have been proposed, achieving scalable performance by directly applying RL to sequential task streams remains challenging. In this paper, we propose a novel teacher-student framework that decouples CRL into two independent processes: training single-task teacher models through distributed RL and continually distilling them into a central generalist model. This design is motivated by the observation that RL excels at solving single tasks, while policy distillation -- a relatively stable supervised learning process -- is well…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Intelligent Tutoring Systems and Adaptive Learning
