Potential-based Reward Shaping in Sokoban

Zhao Yang; Mike Preuss; Aske Plaat

arXiv:2109.05022·cs.LG·September 14, 2021·1 cites

Potential-based Reward Shaping in Sokoban

Zhao Yang, Mike Preuss, Aske Plaat

PDF

Open Access

TL;DR

This paper explores using A* search to automatically generate potential functions for reward shaping in Sokoban, significantly accelerating learning and enabling generalization across multiple instances.

Contribution

It introduces a novel method of automatic potential function generation via A* search for reward shaping in Sokoban, advancing prior knowledge-based approaches.

Findings

01

Reward shaping accelerates learning in Sokoban.

02

Distance functions are effective as potential functions.

03

Potential-based shaping enables generalization to unseen instances.

Abstract

Learning to solve sparse-reward reinforcement learning problems is difficult, due to the lack of guidance towards the goal. But in some problems, prior knowledge can be used to augment the learning process. Reward shaping is a way to incorporate prior knowledge into the original reward function in order to speed up the learning. While previous work has investigated the use of expert knowledge to generate potential functions, in this work, we study whether we can use a search algorithm(A*) to automatically generate a potential function for reward shaping in Sokoban, a well-known planning task. The results showed that learning with shaped reward function is faster than learning from scratch. Our results indicate that distance functions could be a suitable function for Sokoban. This work demonstrates the possibility of solving multiple instances with the help of reward shaping. The result…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics