A New View on Planning in Online Reinforcement Learning

Kevin Roice; Parham Mohammad Panahi; Scott M. Jordan; Adam White,; Martha White

arXiv:2406.01562·cs.LG·June 4, 2024

A New View on Planning in Online Reinforcement Learning

Kevin Roice, Parham Mohammad Panahi, Scott M. Jordan, Adam White,, Martha White

PDF

Open Access

TL;DR

This paper introduces goal-space planning (GSP), a novel approach in online reinforcement learning that constrains background planning to subgoals, improving efficiency and learning speed by avoiding inaccurate model issues.

Contribution

The paper proposes GSP, a new method that uses subgoal-conditioned models for more efficient, accurate, and abstracted planning in reinforcement learning, bypassing the need for transition dynamics.

Findings

01

GSP accelerates learning across various domains.

02

It effectively propagates value in an abstract goal space.

03

GSP outperforms traditional model-based methods in efficiency.

Abstract

This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsSparse Evolutionary Training · Double Q-learning · Dense Connections · Balanced Selection · Q-Learning · Experience Replay · Deep Q-Network · Convolution · Double DQN