Continual Deep Reinforcement Learning with Task-Agnostic Policy   Distillation

Muhammad Burhan Hafez; Kerim Erekmen

arXiv:2411.16532·cs.LG·January 8, 2025

Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation

Muhammad Burhan Hafez, Kerim Erekmen

PDF

Open Access 1 Repo

TL;DR

This paper introduces TAPD, a task-agnostic framework for continual deep reinforcement learning that enhances sample efficiency and mitigates forgetting by exploring environments without specific goals, then distilling this knowledge for downstream tasks.

Contribution

The paper proposes a novel task-agnostic policy distillation approach that improves continual learning by enabling exploration without task labels and distilling knowledge for efficient downstream task solving.

Findings

01

Improved sample efficiency in downstream tasks.

02

Reduced catastrophic forgetting during continual learning.

03

Effective exploration without task-specific guidance.

Abstract

Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various methods due to the complexity of the problem space. This problem space includes: (1) addressing catastrophic forgetting to retain previously learned tasks, (2) demonstrating positive forward transfer for faster learning, (3) ensuring scalability across numerous tasks, and (4) facilitating learning without requiring task labels, even in the absence of clear task boundaries. In this paper, the Task-Agnostic Policy Distillation (TAPD) framework is introduced. This framework alleviates problems (1)-(4) by incorporating a task-agnostic phase, where an agent explores its environment without any external goal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wabbajack1/tapd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · EEG and Brain-Computer Interfaces