Reset-free Reinforcement Learning with World Models

Zhao Yang; Thomas M. Moerland; Mike Preuss; Aske Plaat; Edward S. Hu

arXiv:2408.09807·cs.AI·February 25, 2025

Reset-free Reinforcement Learning with World Models

Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Edward S. Hu

PDF

Open Access

TL;DR

This paper introduces MoReFree, a model-based reinforcement learning approach that effectively handles reset-free tasks, reducing human effort and outperforming supervised baselines in data efficiency and task performance.

Contribution

The paper proposes MoReFree, a novel model-based RL method that enhances exploration and policy learning for reset-free tasks, outperforming prior methods without requiring environmental rewards or demonstrations.

Findings

01

MoReFree outperforms state-of-the-art reset-free RL methods.

02

MoReFree achieves higher data efficiency in various tasks.

03

MoReFree operates without access to environmental rewards or demonstrations.

Abstract

Reinforcement learning (RL) is an appealing paradigm for training intelligent agents, enabling policy acquisition from the agent's own autonomously acquired experience. However, the training process of RL is far from automatic, requiring extensive human effort to reset the agent and environments. To tackle the challenging reset-free setting, we first demonstrate the superiority of model-based (MB) RL methods in such setting, showing that a straightforward adaptation of MBRL can outperform all the prior state-of-the-art methods while requiring less supervision. We then identify limitations inherent to this direct extension and propose a solution called model-based reset-free (MoReFree) agent, which further enhances the performance. MoReFree adapts two key mechanisms, exploration and policy learning, to handle reset-free tasks by prioritizing task-relevant states. It exhibits superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics