Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning
Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter, Abbeel, Alessandro Lazaric, Lerrel Pinto

TL;DR
This paper introduces ExORL, a data-centric approach for offline reinforcement learning that emphasizes the importance of exploratory data generation over algorithm modifications, leading to improved performance.
Contribution
The paper proposes a novel data collection method using unsupervised exploration and relabeling, demonstrating its effectiveness with standard RL algorithms in offline settings.
Findings
Exploratory data enables vanilla offline RL algorithms to outperform specialized methods.
Data generation is as crucial as algorithm design in offline RL.
Relabeling exploratory data with downstream rewards improves policy training.
Abstract
Recent progress in deep learning has relied on access to large and diverse datasets. Such data-driven progress has been less evident in offline reinforcement learning (RL), because offline RL data is usually collected to optimize specific target tasks limiting the data's diversity. In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL. ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL. We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks. Our findings suggest that data generation is as important as algorithmic advances for offline RL and hence requires careful consideration from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Reinforcement Learning in Robotics
