Search on the Replay Buffer: Bridging Planning and Reinforcement Learning
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

TL;DR
This paper introduces SoRB, a novel algorithm that combines planning and reinforcement learning by constructing a graph from replay buffer observations to solve long-horizon, sparse reward tasks effectively.
Contribution
The paper proposes a new method that builds a graph from replay buffer data using RL-derived edge weights, enabling planning over long horizons in high-dimensional environments.
Findings
SoRB solves sparse reward tasks over 100 steps.
It generalizes better than standard RL algorithms.
Uses graph search over replay buffer for subgoal generation.
Abstract
The history of learning for control has been an exciting back and forth between two broad classes of algorithms: planning and reinforcement learning. Planning algorithms effectively reason over long horizons, but assume access to a local policy and distance metric over collision-free paths. Reinforcement learning excels at learning policies and the relative values of states, but fails to plan over long horizons. Despite the successes of each method in various domains, tasks that require reasoning over long horizons with limited feedback and high-dimensional observations remain exceedingly challenging for both planning and reinforcement learning algorithms. Frustratingly, these sorts of tasks are potentially the most useful, as they are simple to design (a human only need to provide an example goal state) and avoid reward shaping, which can bias the agent towards finding a sub-optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robotic Path Planning Algorithms · Multimodal Machine Learning Applications
