Data-Incremental Continual Offline Reinforcement Learning
Sibo Gai, Donglin Wang

TL;DR
This paper introduces a new setting called data-incremental continual offline reinforcement learning (DICORL), addressing active forgetting caused by conservative offline RL methods, and proposes EREIQL to mitigate this issue and improve learning performance.
Contribution
The paper defines the DICORL setting, identifies active forgetting as a key challenge, and proposes the EREIQL algorithm to reduce forgetting and enhance continual offline RL learning.
Findings
EREIQL relieves active forgetting in DICORL.
EREIQL outperforms existing methods in experiments.
Multiple value networks help mitigate conservative learning effects.
Abstract
In this work, we propose a new setting of continual learning: data-incremental continual offline reinforcement learning (DICORL), in which an agent is asked to learn a sequence of datasets of a single offline reinforcement learning (RL) task continually, instead of learning a sequence of offline RL tasks with respective datasets. Then, we propose that this new setting will introduce a unique challenge to continual learning: active forgetting, which means that the agent will forget the learnt skill actively. The main reason for active forgetting is conservative learning used by offline RL, which is used to solve the overestimation problem. With conservative learning, the offline RL method will suppress the value of all actions, learnt or not, without selection, unless it is in the just learning dataset. Therefore, inferior data may overlay premium data because of the learning sequence.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces
MethodsQ-Learning · Focus · Experience Replay
