Search on the Replay Buffer: Bridging Planning and Reinforcement   Learning

Benjamin Eysenbach; Ruslan Salakhutdinov; Sergey Levine

arXiv:1906.05253·cs.AI·June 13, 2019·39 cites

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

PDF

Open Access 1 Repo

TL;DR

This paper introduces SoRB, a novel algorithm that combines planning and reinforcement learning by constructing a graph from replay buffer observations to solve long-horizon, sparse reward tasks effectively.

Contribution

The paper proposes a new method that builds a graph from replay buffer data using RL-derived edge weights, enabling planning over long horizons in high-dimensional environments.

Findings

01

SoRB solves sparse reward tasks over 100 steps.

02

It generalizes better than standard RL algorithms.

03

Uses graph search over replay buffer for subgoal generation.

Abstract

The history of learning for control has been an exciting back and forth between two broad classes of algorithms: planning and reinforcement learning. Planning algorithms effectively reason over long horizons, but assume access to a local policy and distance metric over collision-free paths. Reinforcement learning excels at learning policies and the relative values of states, but fails to plan over long horizons. Despite the successes of each method in various domains, tasks that require reasoning over long horizons with limited feedback and high-dimensional observations remain exceedingly challenging for both planning and reinforcement learning algorithms. Frustratingly, these sorts of tasks are potentially the most useful, as they are simple to design (a human only need to provide an example goal state) and avoid reward shaping, which can bias the agent towards finding a sub-optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/google-research
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robotic Path Planning Algorithms · Multimodal Machine Learning Applications