Curious Explorer: a provable exploration strategy in Policy Learning

Marco Miani; Maurizio Parton; Marco Romito

arXiv:2106.15503·cs.LG·June 30, 2021

Curious Explorer: a provable exploration strategy in Policy Learning

Marco Miani, Maurizio Parton, Marco Romito

PDF

Open Access

TL;DR

This paper introduces Curious Explorer, a provable exploration strategy for policy learning that improves convergence and sample efficiency without coverage assumptions, demonstrated through theoretical bounds and empirical results.

Contribution

Curious Explorer is a novel exploration method that adaptively improves exploration in policy gradient methods without relying on wide coverage assumptions.

Findings

01

Provides theoretical bounds on visiting poorly explored states.

02

Achieves PAC convergence and sample efficiency without coverage assumptions.

03

Improves performance of REINFORCE and TRPO in challenging environments.

Abstract

Having access to an exploring restart distribution (the so-called wide coverage assumption) is critical with policy gradient methods. This is due to the fact that, while the objective function is insensitive to updates in unlikely states, the agent may still need improvements in those states in order to reach a nearly optimal payoff. For this reason, wide coverage is used in some form when analyzing theoretical properties of practical policy gradient methods. However, this assumption can be unfeasible in certain environments, for instance when learning is online, or when restarts are possible only from a fixed initial state. In these cases, classical policy gradient algorithms can have very poor convergence properties and sample efficiency. In this paper, we develop Curious Explorer, a novel and simple iterative state space exploration strategy that can be used with any starting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Machine Learning and Algorithms

MethodsTrust Region Policy Optimization · REINFORCE