Diverse Exploration for Fast and Safe Policy Improvement
Andrew Cohen, Lei Yu, Robert Wright

TL;DR
This paper introduces diverse exploration (DE), a novel strategy in online reinforcement learning that uses a set of safe, diverse policies to achieve rapid and safe policy improvements.
Contribution
The paper proposes a new exploration method, diverse exploration, with theoretical justification and empirical validation for safe, fast policy improvement in reinforcement learning.
Findings
DE enables effective exploration without sacrificing exploitation.
The framework achieves both rapid policy improvement and safety.
Empirical results confirm the effectiveness of DE in online RL.
Abstract
We study an important yet under-addressed problem of quickly and safely improving policies in online reinforcement learning domains. As its solution, we propose a novel exploration strategy - diverse exploration (DE), which learns and deploys a diverse set of safe policies to explore the environment. We provide DE theory explaining why diversity in behavior policies enables effective exploration without sacrificing exploitation. Our empirical study shows that an online policy improvement algorithm framework implementing the DE strategy can achieve both fast policy improvement and safe online performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
