Solving Sokoban with forward-backward reinforcement learning

Yaron Shoham; Gal Elidan

arXiv:2105.01904·cs.LG·May 25, 2021

Solving Sokoban with forward-backward reinforcement learning

Yaron Shoham, Gal Elidan

PDF

Open Access

TL;DR

This paper introduces a novel reinforcement learning method that combines forward and backward planning to efficiently solve sparse reward problems like Sokoban, outperforming existing learned solvers and rivaling handcrafted systems.

Contribution

The authors propose a new RL approach that integrates backward planning hints into forward learning, enabling effective solving of complex puzzles with limited training data.

Findings

01

Outperforms existing learned Sokoban solvers

02

Achieves state-of-the-art results with simple RL techniques

03

Learns efficiently from few practice levels

Abstract

Despite seminal advances in reinforcement learning in recent years, many domains where the rewards are sparse, e.g. given only at task completion, remain quite challenging. In such cases, it can be beneficial to tackle the task both from its beginning and end, and make the two ends meet. Existing approaches that do so, however, are not effective in the common scenario where the strategy needed near the end goal is very different from the one that is effective earlier on. In this work we propose a novel RL approach for such settings. In short, we first train a backward-looking agent with a simple relaxed goal, and then augment the state representation of the forward-looking agent with straightforward hint features. This allows the learned forward agent to leverage information from backward plans, without mimicking their policy. We demonstrate the efficacy of our approach on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques