Data-Incremental Continual Offline Reinforcement Learning

Sibo Gai; Donglin Wang

arXiv:2404.12639·cs.LG·December 17, 2024

Data-Incremental Continual Offline Reinforcement Learning

Sibo Gai, Donglin Wang

PDF

Open Access

TL;DR

This paper introduces a new setting called data-incremental continual offline reinforcement learning (DICORL), addressing active forgetting caused by conservative offline RL methods, and proposes EREIQL to mitigate this issue and improve learning performance.

Contribution

The paper defines the DICORL setting, identifies active forgetting as a key challenge, and proposes the EREIQL algorithm to reduce forgetting and enhance continual offline RL learning.

Findings

01

EREIQL relieves active forgetting in DICORL.

02

EREIQL outperforms existing methods in experiments.

03

Multiple value networks help mitigate conservative learning effects.

Abstract

In this work, we propose a new setting of continual learning: data-incremental continual offline reinforcement learning (DICORL), in which an agent is asked to learn a sequence of datasets of a single offline reinforcement learning (RL) task continually, instead of learning a sequence of offline RL tasks with respective datasets. Then, we propose that this new setting will introduce a unique challenge to continual learning: active forgetting, which means that the agent will forget the learnt skill actively. The main reason for active forgetting is conservative learning used by offline RL, which is used to solve the overestimation problem. With conservative learning, the offline RL method will suppress the value of all actions, learnt or not, without selection, unless it is in the just learning dataset. Therefore, inferior data may overlay premium data because of the learning sequence.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces

MethodsQ-Learning · Focus · Experience Replay