OER: Offline Experience Replay for Continual Offline Reinforcement Learning
Sibo Gai, Donglin Wang, Li He

TL;DR
This paper introduces OER, a novel offline experience replay method for continual offline reinforcement learning, addressing distribution shift and knowledge retention challenges to improve performance across sequential tasks.
Contribution
The paper proposes a new algorithm, OER, combining model-based experience selection and dual behavior cloning to enhance continual offline RL performance.
Findings
OER outperforms state-of-the-art baselines in Mujoco environments.
The model-based experience selection effectively reduces distribution bias.
Dual behavior cloning improves learning stability on new tasks.
Abstract
The capability of continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent. However, consecutively learning a sequence of offline tasks likely leads to the catastrophic forgetting issue under resource-limited scenarios. In this paper, we formulate a new setting, continual offline reinforcement learning (CORL), where an agent learns a sequence of offline reinforcement learning tasks and pursues good performance on all learned tasks with a small replay buffer without exploring any of the environments of all the sequential tasks. For consistently learning on all sequential tasks, an agent requires acquiring new knowledge and meanwhile preserving old knowledge in an offline manner. To this end, we introduced continual learning algorithms and experimentally found experience replay (ER) to be the most suitable algorithm for the CORL problem.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization
MethodsQ-Learning · Experience Replay
