Diverse Exploration for Fast and Safe Policy Improvement

Andrew Cohen; Lei Yu; Robert Wright

arXiv:1802.08331·cs.LG·February 26, 2018

Diverse Exploration for Fast and Safe Policy Improvement

Andrew Cohen, Lei Yu, Robert Wright

PDF

TL;DR

This paper introduces diverse exploration (DE), a novel strategy in online reinforcement learning that uses a set of safe, diverse policies to achieve rapid and safe policy improvements.

Contribution

The paper proposes a new exploration method, diverse exploration, with theoretical justification and empirical validation for safe, fast policy improvement in reinforcement learning.

Findings

01

DE enables effective exploration without sacrificing exploitation.

02

The framework achieves both rapid policy improvement and safety.

03

Empirical results confirm the effectiveness of DE in online RL.

Abstract

We study an important yet under-addressed problem of quickly and safely improving policies in online reinforcement learning domains. As its solution, we propose a novel exploration strategy - diverse exploration (DE), which learns and deploys a diverse set of safe policies to explore the environment. We provide DE theory explaining why diversity in behavior policies enables effective exploration without sacrificing exploitation. Our empirical study shows that an online policy improvement algorithm framework implementing the DE strategy can achieve both fast policy improvement and safe online performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.