How do Offline Measures for Exploration in Reinforcement Learning   behave?

Jakob J. Hollenstein; Sayantan Auddy; Matteo Saveriano; Erwan Renaudo,; Justus Piater

arXiv:2010.15533·cs.LG·October 30, 2020

How do Offline Measures for Exploration in Reinforcement Learning behave?

Jakob J. Hollenstein, Sayantan Auddy, Matteo Saveriano, Erwan Renaudo,, Justus Piater

PDF

Open Access

TL;DR

This paper compares existing offline exploration metrics in reinforcement learning, highlights their limitations, and introduces a new metric, uniform relative entropy, emphasizing the importance of implementation choices.

Contribution

It provides a systematic comparison of offline exploration measures and proposes a novel metric with implementation insights.

Findings

01

Existing metrics have limitations on simple distributions.

02

Implementation choices significantly affect measure outcomes.

03

The new uniform relative entropy metric offers a promising alternative.

Abstract

Sufficient exploration is paramount for the success of a reinforcement learning agent. Yet, exploration is rarely assessed in an algorithm-independent way. We compare the behavior of three data-based, offline exploration metrics described in the literature on intuitive simple distributions and highlight problems to be aware of when using them. We propose a fourth metric,uniform relative entropy, and implement it using either a k-nearest-neighbor or a nearest-neighbor-ratio estimator, highlighting that the implementation choices have a profound impact on these measures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms