Solving Sokoban with forward-backward reinforcement learning
Yaron Shoham, Gal Elidan

TL;DR
This paper introduces a novel reinforcement learning method that combines forward and backward planning to efficiently solve sparse reward problems like Sokoban, outperforming existing learned solvers and rivaling handcrafted systems.
Contribution
The authors propose a new RL approach that integrates backward planning hints into forward learning, enabling effective solving of complex puzzles with limited training data.
Findings
Outperforms existing learned Sokoban solvers
Achieves state-of-the-art results with simple RL techniques
Learns efficiently from few practice levels
Abstract
Despite seminal advances in reinforcement learning in recent years, many domains where the rewards are sparse, e.g. given only at task completion, remain quite challenging. In such cases, it can be beneficial to tackle the task both from its beginning and end, and make the two ends meet. Existing approaches that do so, however, are not effective in the common scenario where the strategy needed near the end goal is very different from the one that is effective earlier on. In this work we propose a novel RL approach for such settings. In short, we first train a backward-looking agent with a simple relaxed goal, and then augment the state representation of the forward-looking agent with straightforward hint features. This allows the learned forward agent to leverage information from backward plans, without mimicking their policy. We demonstrate the efficacy of our approach on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques
