# Generative Exploration and Exploitation

**Authors:** Jiechuan Jiang, Zongqing Lu

arXiv: 1904.09605 · 2019-11-21

## TL;DR

This paper introduces GENE, a novel method for reinforcement learning that automatically balances exploration and exploitation by generating start states, significantly improving performance in environments with sparse rewards.

## Contribution

GENE is a new approach that adaptively manages exploration and exploitation without prior environment knowledge, compatible with various RL algorithms.

## Key findings

- GENE outperforms existing methods in sparse reward tasks
- Empirical results on Maze, Maze Ant, and Cooperative Navigation show significant improvements
- Ablation studies confirm the emergence of progressive exploration and automatic reversing

## Abstract

Sparse reward is one of the biggest challenges in reinforcement learning (RL). In this paper, we propose a novel method called Generative Exploration and Exploitation (GENE) to overcome sparse reward. GENE automatically generates start states to encourage the agent to explore the environment and to exploit received reward signals. GENE can adaptively tradeoff between exploration and exploitation according to the varying distributions of states experienced by the agent as the learning progresses. GENE relies on no prior knowledge about the environment and can be combined with any RL algorithm, no matter on-policy or off-policy, single-agent or multi-agent. Empirically, we demonstrate that GENE significantly outperforms existing methods in three tasks with only binary rewards, including Maze, Maze Ant, and Cooperative Navigation. Ablation studies verify the emergence of progressive exploration and automatic reversing.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.09605/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/1904.09605/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1904.09605/full.md

---
Source: https://tomesphere.com/paper/1904.09605