A Novel Framework for Policy Mirror Descent with General   Parameterization and Linear Convergence

Carlo Alfano; Rui Yuan; Patrick Rebeschini

arXiv:2301.13139·stat.ML·February 14, 2024

A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence

Carlo Alfano, Rui Yuan, Patrick Rebeschini

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new policy optimization framework based on mirror descent that supports general parameterizations, guarantees linear convergence, and improves sample complexity with neural networks, validated on control tasks.

Contribution

It develops a novel mirror descent-based policy optimization framework that handles general parameterizations and provides the first linear convergence guarantee for such methods.

Findings

01

Guarantees linear convergence for general parameterized policies.

02

Improves sample complexity for shallow neural network policies.

03

Empirically validates theoretical results on control tasks.

Abstract

Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe their success to the use of parameterized policies. However, while theoretical guarantees have been established for this class of algorithms, especially in the tabular setting, the use of general parameterization schemes remains mostly unjustified. In this work, we introduce a novel framework for policy optimization based on mirror descent that naturally accommodates general parameterizations. The policy class induced by our scheme recovers known classes, e.g., softmax, and generates new ones depending on the choice of mirror map. Using our framework, we obtain the first result that guarantees linear convergence for a policy-gradient-based method involving general parameterization. To demonstrate the ability of our framework to accommodate general parameterization schemes, we provide its sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

c-alfano/approximate-mirror-policy-optimization
jaxOfficial

Videos

A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence· slideslive

Taxonomy

TopicsAdvanced Memory and Neural Computing · Reinforcement Learning in Robotics · Fuel Cells and Related Materials

MethodsEntropy Regularization · Proximal Policy Optimization · Trust Region Policy Optimization