Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates

Romain Laroche; Remi Tachet

arXiv:2109.14727·cs.LG·October 1, 2021·1 cites

Dr Jekyll and Mr Hyde: the Strange Case of Off-Policy Policy Updates

Romain Laroche, Remi Tachet

PDF

Open Access 1 Repo

TL;DR

This paper extends policy gradient theory to include updates based on any state density, improving convergence to optimal policies and introducing a novel agent with dual exploration and exploitation strategies.

Contribution

It generalizes policy gradient updates to arbitrary state densities, providing convergence guarantees and a new agent design with separate exploration and exploitation policies.

Findings

01

JH outperforms traditional methods in recovering from suboptimal convergence

02

Theoretical convergence rates are significantly improved

03

Deep version shows promising results on simple problems

Abstract

The policy gradient theorem states that the policy should only be updated in states that are visited by the current policy, which leads to insufficient planning in the off-policy states, and thus to convergence to suboptimal policies. We tackle this planning issue by extending the policy gradient theory to policy updates with respect to any state density. Under these generalized policy updates, we show convergence to optimality under a necessary and sufficient condition on the updates' state densities, and thereby solve the aforementioned planning issue. We also prove asymptotic convergence rates that significantly improve those in the policy gradient literature. To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (JH), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores. JH's independent policies allow to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/dr-jekyll-and-mr-hyde-the-strange-case-of-off-policy-policy-updates
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsTest