OER: Offline Experience Replay for Continual Offline Reinforcement   Learning

Sibo Gai; Donglin Wang; Li He

arXiv:2305.13804·cs.LG·April 23, 2024·1 cites

OER: Offline Experience Replay for Continual Offline Reinforcement Learning

Sibo Gai, Donglin Wang, Li He

PDF

Open Access

TL;DR

This paper introduces OER, a novel offline experience replay method for continual offline reinforcement learning, addressing distribution shift and knowledge retention challenges to improve performance across sequential tasks.

Contribution

The paper proposes a new algorithm, OER, combining model-based experience selection and dual behavior cloning to enhance continual offline RL performance.

Findings

01

OER outperforms state-of-the-art baselines in Mujoco environments.

02

The model-based experience selection effectively reduces distribution bias.

03

Dual behavior cloning improves learning stability on new tasks.

Abstract

The capability of continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent. However, consecutively learning a sequence of offline tasks likely leads to the catastrophic forgetting issue under resource-limited scenarios. In this paper, we formulate a new setting, continual offline reinforcement learning (CORL), where an agent learns a sequence of offline reinforcement learning tasks and pursues good performance on all learned tasks with a small replay buffer without exploring any of the environments of all the sequential tasks. For consistently learning on all sequential tasks, an agent requires acquiring new knowledge and meanwhile preserving old knowledge in an offline manner. To this end, we introduced continual learning algorithms and experimentally found experience replay (ER) to be the most suitable algorithm for the CORL problem.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization

MethodsQ-Learning · Experience Replay