Don't Change the Algorithm, Change the Data: Exploratory Data for   Offline Reinforcement Learning

Denis Yarats; David Brandfonbrener; Hao Liu; Michael Laskin; Pieter; Abbeel; Alessandro Lazaric; Lerrel Pinto

arXiv:2201.13425·cs.LG·April 7, 2022·21 cites

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter, Abbeel, Alessandro Lazaric, Lerrel Pinto

PDF

Open Access 1 Repo

TL;DR

This paper introduces ExORL, a data-centric approach for offline reinforcement learning that emphasizes the importance of exploratory data generation over algorithm modifications, leading to improved performance.

Contribution

The paper proposes a novel data collection method using unsupervised exploration and relabeling, demonstrating its effectiveness with standard RL algorithms in offline settings.

Findings

01

Exploratory data enables vanilla offline RL algorithms to outperform specialized methods.

02

Data generation is as crucial as algorithm design in offline RL.

03

Relabeling exploratory data with downstream rewards improves policy training.

Abstract

Recent progress in deep learning has relied on access to large and diverse datasets. Such data-driven progress has been less evident in offline reinforcement learning (RL), because offline RL data is usually collected to optimize specific target tasks limiting the data's diversity. In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL. ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL. We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks. Our findings suggest that data generation is as important as algorithmic advances for offline RL and hence requires careful consideration from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

denisyarats/exorl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Reinforcement Learning in Robotics