Rank the Episodes: A Simple Approach for Exploration in   Procedurally-Generated Environments

Daochen Zha; Wenye Ma; Lei Yuan; Xia Hu; Ji Liu

arXiv:2101.08152·cs.LG·February 5, 2021·5 cites

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Daochen Zha, Wenye Ma, Lei Yuan, Xia Hu, Ji Liu

PDF

Open Access 3 Repos 1 Video

TL;DR

RAPID is an episode-level exploration method for procedurally-generated environments that improves exploration by imitating high-scoring episodes, leading to better sample efficiency and performance.

Contribution

The paper introduces RAPID, a novel episode-based exploration approach that leverages episode ranking and imitation in procedurally-generated environments.

Findings

01

RAPID outperforms state-of-the-art intrinsic reward methods in various environments.

02

RAPID achieves higher sample efficiency and final performance.

03

The method is effective across MiniGrid, MiniWorld, and MuJoCo tasks.

Abstract

Exploration under sparse reward is a long-standing challenge of model-free reinforcement learning. The state-of-the-art methods address this challenge by introducing intrinsic rewards to encourage exploration in novel states or uncertain environment dynamics. Unfortunately, methods based on intrinsic rewards often fall short in procedurally-generated environments, where a different environment is generated in each episode so that the agent is not likely to visit the same state more than once. Motivated by how humans distinguish good exploration behaviors by looking into the entire episode, we introduce RAPID, a simple yet effective episode-level exploration method for procedurally-generated environments. RAPID regards each episode as a whole and gives an episodic exploration score from both per-episode and long-term views. Those highly scored episodes are treated as good exploration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Social Robot Interaction and HRI