Policy Gradient Algorithms Implicitly Optimize by Continuation

Adrien Bolland; Gilles Louppe; Damien Ernst

arXiv:2305.06851·cs.LG·October 24, 2023·1 cites

Policy Gradient Algorithms Implicitly Optimize by Continuation

Adrien Bolland, Gilles Louppe, Damien Ernst

PDF

Open Access

TL;DR

This paper offers a new theoretical perspective on policy-gradient algorithms in reinforcement learning, framing them as implicit continuation optimizations that enhance exploration and policy variance adaptation.

Contribution

It introduces a continuation framework for policy optimization and interprets entropy regularization as implicit deterministic policy optimization.

Findings

01

Policy gradients can be viewed as continuation methods.

02

Entropy regularization implicitly optimizes deterministic policies.

03

Policy variance should adapt based on history to improve exploration.

Abstract

Direct policy optimization in reinforcement learning is usually solved with policy-gradient algorithms, which optimize policy parameters via stochastic gradient ascent. This paper provides a new theoretical interpretation and justification of these algorithms. First, we formulate direct policy optimization in the optimization by continuation framework. The latter is a framework for optimizing nonconvex functions where a sequence of surrogate objective functions, called continuations, are locally optimized. Second, we show that optimizing affine Gaussian policies and performing entropy regularization can be interpreted as implicitly optimizing deterministic policies by continuation. Based on these theoretical results, we argue that exploration in policy-gradient algorithms consists in computing a continuation of the return of the policy at hand, and that the variance of policies should…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference

MethodsEntropy Regularization