# On Hard Exploration for Reinforcement Learning: a Case Study in   Pommerman

**Authors:** Chao Gao, Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor

arXiv: 1907.11788 · 2019-07-30

## TL;DR

This paper investigates the challenges of exploration in sparse reward environments like Pommerman, revealing the limitations of random exploration and proposing a model-based approach to improve learning efficiency.

## Contribution

It introduces a model-based reasoning module that prunes unsafe actions, significantly enhancing exploration and learning in a complex multi-agent RL benchmark.

## Key findings

- Random exploration is ineffective in Pommerman
- The proposed reasoning module improves learning performance
- Model-based pruning leads to safer exploration

## Abstract

How to best explore in domains with sparse, delayed, and deceptive rewards is an important open problem for reinforcement learning (RL). This paper considers one such domain, the recently-proposed multi-agent benchmark of Pommerman. This domain is very challenging for RL --- past work has shown that model-free RL algorithms fail to achieve significant learning without artificially reducing the environment's complexity. In this paper, we illuminate reasons behind this failure by providing a thorough analysis on the hardness of random exploration in Pommerman. While model-free random exploration is typically futile, we develop a model-based automatic reasoning module that can be used for safer exploration by pruning actions that will surely lead the agent to death. We empirically demonstrate that this module can significantly improve learning.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.11788/full.md

## Figures

23 figures with captions in the complete paper: https://tomesphere.com/paper/1907.11788/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1907.11788/full.md

---
Source: https://tomesphere.com/paper/1907.11788