Forgetting and Imbalance in Robot Lifelong Learning with Off-policy Data

Wenxuan Zhou; Steven Bohez; Jan Humplik; Abbas Abdolmaleki; Dushyant; Rao; Markus Wulfmeier; Tuomas Haarnoja; Nicolas Heess

arXiv:2204.05893·cs.RO·August 19, 2022

Forgetting and Imbalance in Robot Lifelong Learning with Off-policy Data

Wenxuan Zhou, Steven Bohez, Jan Humplik, Abbas Abdolmaleki, Dushyant, Rao, Markus Wulfmeier, Tuomas Haarnoja, Nicolas Heess

PDF

Open Access

TL;DR

This paper addresses challenges in robot lifelong reinforcement learning with off-policy data, proposing an Offline Distillation Pipeline to improve performance across changing environments and mitigate data imbalance issues.

Contribution

The paper introduces the Offline Distillation Pipeline, separating online interaction and offline distillation, to handle the trade-off and data imbalance in lifelong RL.

Findings

01

The pipeline improves performance across multiple environments.

02

Data imbalance causes significant performance drops.

03

Keeping the policy close to behavior policy mitigates extrapolation error.

Abstract

Robots will experience non-stationary environment dynamics throughout their lifetime: the robot dynamics can change due to wear and tear, or its surroundings may change over time. Eventually, the robots should perform well in all of the environment variations it has encountered. At the same time, it should still be able to learn fast in a new environment. We identify two challenges in Reinforcement Learning (RL) under such a lifelong learning setting with off-policy data: first, existing off-policy algorithms struggle with the trade-off between being conservative to maintain good performance in the old environment and learning efficiently in the new environment, despite keeping all the data in the replay buffer. We propose the Offline Distillation Pipeline to break this trade-off by separating the training procedure into an online interaction phase and an offline distillation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics