Efficient Reinforcement Learning by Guiding Generalist World Models with Non-Curated Data
Yi Zhao, Aidan Scannell, Wenshuai Zhao, Yuxin Hou, Tianyu Cui, Le Chen, Dieter B\"uchler, Arno Solin, Juho Kannala, Joni Pajarinen

TL;DR
This paper introduces techniques to effectively leverage abundant non-curated, reward-free offline data to improve reinforcement learning sample efficiency, addressing distributional shift issues during fine-tuning.
Contribution
It proposes experience rehearsal and execution guidance methods to utilize non-curated offline data effectively in RL training, significantly enhancing sample efficiency.
Findings
Achieves 102.8% relative improvement over learning-from-scratch baselines.
Outperforms prior offline data methods on locomotion and robotic manipulation tasks.
Effectively uses mixed-quality, multi-embodiment offline data for RL.
Abstract
Leveraging offline data is a promising way to improve the sample efficiency of online reinforcement learning (RL). This paper expands the pool of usable data for offline-to-online RL by leveraging abundant non-curated data that is reward-free, of mixed quality, and collected across multiple embodiments. Although learning a world model appears promising for utilizing such data, we find that naive fine-tuning fails to accelerate RL training on many tasks. Through careful investigation, we attribute this failure to the distributional shift between offline and online data during fine-tuning. To address this issue and effectively use the offline data, we propose two essential techniques: \emph{i)} experience rehearsal and \emph{ii)} execution guidance. With these modifications, the non-curated offline data substantially improves RL's sample efficiency. Under limited sample budgets, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI
