Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate   Exploration Bias

Max Sobol Mark; Archit Sharma; Fahim Tajwar; Rafael Rafailov; Sergey; Levine; Chelsea Finn

arXiv:2310.08558·cs.LG·October 13, 2023·1 cites

Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias

Max Sobol Mark, Archit Sharma, Fahim Tajwar, Rafael Rafailov, Sergey, Levine, Chelsea Finn

PDF

Open Access 1 Repo

TL;DR

This paper introduces offline retraining, a method that decouples exploration and evaluation policies in online RL to reduce bias and improve performance, leveraging offline RL techniques during fine-tuning.

Contribution

The paper proposes offline retraining within the OOO framework, enabling separate policies for exploration and evaluation to enhance online RL performance.

Findings

01

Improves average performance of offline-to-online RL methods by 14-26%.

02

Achieves state-of-the-art results on D4RL benchmarks.

03

Enhances online RL performance by 165% on certain environments.

Abstract

It is desirable for policies to optimistically explore new states and behaviors during online reinforcement learning (RL) or fine-tuning, especially when prior offline data does not provide enough state coverage. However, exploration bonuses can bias the learned policy, and our experiments find that naive, yet standard use of such bonuses can fail to recover a performant policy. Concurrently, pessimistic training in offline RL has enabled recovery of performant policies from static datasets. Can we leverage offline RL to recover better policies from online interaction? We make a simple observation that a policy can be trained from scratch on all interaction data with pessimistic objectives, thereby decoupling the policies used for data collection and for evaluation. Specifically, we propose offline retraining, a policy extraction step at the end of online fine-tuning in our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MaxSobolMark/OOO
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Digital Games and Media