How do Offline Measures for Exploration in Reinforcement Learning behave?
Jakob J. Hollenstein, Sayantan Auddy, Matteo Saveriano, Erwan Renaudo,, Justus Piater

TL;DR
This paper compares existing offline exploration metrics in reinforcement learning, highlights their limitations, and introduces a new metric, uniform relative entropy, emphasizing the importance of implementation choices.
Contribution
It provides a systematic comparison of offline exploration measures and proposes a novel metric with implementation insights.
Findings
Existing metrics have limitations on simple distributions.
Implementation choices significantly affect measure outcomes.
The new uniform relative entropy metric offers a promising alternative.
Abstract
Sufficient exploration is paramount for the success of a reinforcement learning agent. Yet, exploration is rarely assessed in an algorithm-independent way. We compare the behavior of three data-based, offline exploration metrics described in the literature on intuitive simple distributions and highlight problems to be aware of when using them. We propose a fourth metric,uniform relative entropy, and implement it using either a k-nearest-neighbor or a nearest-neighbor-ratio estimator, highlighting that the implementation choices have a profound impact on these measures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms
