MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement   Learning and Procedurally Generated Environments

Dimitrios I. Koutras; Athanasios Ch. Kapoutsis; Angelos A.; Amanatiadis; Elias B. Kosmatopoulos

arXiv:2107.09996·cs.RO·November 9, 2021

MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments

Dimitrios I. Koutras, Athanasios Ch. Kapoutsis, Angelos A., Amanatiadis, Elias B. Kosmatopoulos

PDF

2 Repos

TL;DR

This paper introduces MarsExplorer, an environment for training reinforcement learning agents to explore unknown terrains, demonstrating that RL policies can generalize well and outperform traditional methods in terrain coverage tasks.

Contribution

MarsExplorer provides a novel, procedurally generated environment for RL-based terrain exploration, enabling policies that generalize and adapt to unknown terrains without detailed robot models.

Findings

01

RL algorithms successfully trained on MarsExplorer outperform human-level performance.

02

PPO learned policies effectively adapt to different terrain difficulties.

03

RL-based exploration strategies outperform frontier-based methods in coverage efficiency.

Abstract

This paper is an initial endeavor to bridge the gap between powerful Deep Reinforcement Learning methodologies and the problem of exploration/coverage of unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible environment tailored to exploration/coverage of unknown areas, is presented. MarsExplorer translates the original robotics problem into a Reinforcement Learning setup that various off-the-shelf algorithms can tackle. Any learned policy can be straightforwardly applied to a robotic platform without an elaborate simulation model of the robot's dynamics to apply a different learning/adaptation phase. One of its core features is the controllable multi-dimensional procedural generation of terrains, which is the key for producing policies with strong generalization capabilities. Four different state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsEntropy Regularization · Proximal Policy Optimization