Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments
Daochen Zha, Wenye Ma, Lei Yuan, Xia Hu, Ji Liu

TL;DR
RAPID is an episode-level exploration method for procedurally-generated environments that improves exploration by imitating high-scoring episodes, leading to better sample efficiency and performance.
Contribution
The paper introduces RAPID, a novel episode-based exploration approach that leverages episode ranking and imitation in procedurally-generated environments.
Findings
RAPID outperforms state-of-the-art intrinsic reward methods in various environments.
RAPID achieves higher sample efficiency and final performance.
The method is effective across MiniGrid, MiniWorld, and MuJoCo tasks.
Abstract
Exploration under sparse reward is a long-standing challenge of model-free reinforcement learning. The state-of-the-art methods address this challenge by introducing intrinsic rewards to encourage exploration in novel states or uncertain environment dynamics. Unfortunately, methods based on intrinsic rewards often fall short in procedurally-generated environments, where a different environment is generated in each episode so that the agent is not likely to visit the same state more than once. Motivated by how humans distinguish good exploration behaviors by looking into the entire episode, we introduce RAPID, a simple yet effective episode-level exploration method for procedurally-generated environments. RAPID regards each episode as a whole and gives an episodic exploration score from both per-episode and long-term views. Those highly scored episodes are treated as good exploration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Social Robot Interaction and HRI
