A New View on Planning in Online Reinforcement Learning
Kevin Roice, Parham Mohammad Panahi, Scott M. Jordan, Adam White,, Martha White

TL;DR
This paper introduces goal-space planning (GSP), a novel approach in online reinforcement learning that constrains background planning to subgoals, improving efficiency and learning speed by avoiding inaccurate model issues.
Contribution
The paper proposes GSP, a new method that uses subgoal-conditioned models for more efficient, accurate, and abstracted planning in reinforcement learning, bypassing the need for transition dynamics.
Findings
GSP accelerates learning across various domains.
It effectively propagates value in an abstract goal space.
GSP outperforms traditional model-based methods in efficiency.
Abstract
This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsSparse Evolutionary Training · Double Q-learning · Dense Connections · Balanced Selection · Q-Learning · Experience Replay · Deep Q-Network · Convolution · Double DQN
