Efficient Reinforcement Learning by Guiding Generalist World Models with Non-Curated Data

Yi Zhao; Aidan Scannell; Wenshuai Zhao; Yuxin Hou; Tianyu Cui; Le Chen; Dieter B\"uchler; Arno Solin; Juho Kannala; Joni Pajarinen

arXiv:2502.19544·cs.LG·May 20, 2025

Efficient Reinforcement Learning by Guiding Generalist World Models with Non-Curated Data

Yi Zhao, Aidan Scannell, Wenshuai Zhao, Yuxin Hou, Tianyu Cui, Le Chen, Dieter B\"uchler, Arno Solin, Juho Kannala, Joni Pajarinen

PDF

Open Access

TL;DR

This paper introduces techniques to effectively leverage abundant non-curated, reward-free offline data to improve reinforcement learning sample efficiency, addressing distributional shift issues during fine-tuning.

Contribution

It proposes experience rehearsal and execution guidance methods to utilize non-curated offline data effectively in RL training, significantly enhancing sample efficiency.

Findings

01

Achieves 102.8% relative improvement over learning-from-scratch baselines.

02

Outperforms prior offline data methods on locomotion and robotic manipulation tasks.

03

Effectively uses mixed-quality, multi-embodiment offline data for RL.

Abstract

Leveraging offline data is a promising way to improve the sample efficiency of online reinforcement learning (RL). This paper expands the pool of usable data for offline-to-online RL by leveraging abundant non-curated data that is reward-free, of mixed quality, and collected across multiple embodiments. Although learning a world model appears promising for utilizing such data, we find that naive fine-tuning fails to accelerate RL training on many tasks. Through careful investigation, we attribute this failure to the distributional shift between offline and online data during fine-tuning. To address this issue and effectively use the offline data, we propose two essential techniques: \emph{i)} experience rehearsal and \emph{ii)} execution guidance. With these modifications, the non-curated offline data substantially improves RL's sample efficiency. Under limited sample budgets, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI