Safer Deep RL with Shallow MCTS: A Case Study in Pommerman
Bilal Kartal, Pablo Hernandez-Leal, Chao Gao, and Matthew E. Taylor

TL;DR
This paper introduces a framework combining shallow Monte Carlo tree search with deep reinforcement learning to improve safety and learning efficiency in Pommerman, a challenging multi-agent environment with sparse rewards.
Contribution
It demonstrates how non-expert demonstrators like shallow MCTS can guide deep RL, reducing catastrophic failures and enhancing learning speed and policy quality.
Findings
Shallow MCTS improves safety by reducing catastrophic events.
The proposed method learns faster than vanilla deep RL.
Policies converge to better strategies in Pommerman.
Abstract
Safe reinforcement learning has many variants and it is still an open research problem. Here, we focus on how to use action guidance by means of a non-expert demonstrator to avoid catastrophic events in a domain with sparse, delayed, and deceptive rewards: the recently-proposed multi-agent benchmark of Pommerman. This domain is very challenging for reinforcement learning (RL) --- past work has shown that model-free RL algorithms fail to achieve significant learning. In this paper, we shed light into the reasons behind this failure by exemplifying and analyzing the high rate of catastrophic events (i.e., suicides) that happen under random exploration in this domain. While model-free random exploration is typically futile, we propose a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with small number of rollouts, can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Sports Analytics and Performance
