Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation
Muhammad Burhan Hafez, Kerim Erekmen

TL;DR
This paper introduces TAPD, a task-agnostic framework for continual deep reinforcement learning that enhances sample efficiency and mitigates forgetting by exploring environments without specific goals, then distilling this knowledge for downstream tasks.
Contribution
The paper proposes a novel task-agnostic policy distillation approach that improves continual learning by enabling exploration without task labels and distilling knowledge for efficient downstream task solving.
Findings
Improved sample efficiency in downstream tasks.
Reduced catastrophic forgetting during continual learning.
Effective exploration without task-specific guidance.
Abstract
Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various methods due to the complexity of the problem space. This problem space includes: (1) addressing catastrophic forgetting to retain previously learned tasks, (2) demonstrating positive forward transfer for faster learning, (3) ensuring scalability across numerous tasks, and (4) facilitating learning without requiring task labels, even in the absence of clear task boundaries. In this paper, the Task-Agnostic Policy Distillation (TAPD) framework is introduced. This framework alleviates problems (1)-(4) by incorporating a task-agnostic phase, where an agent explores its environment without any external goal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · EEG and Brain-Computer Interfaces
