Convergence of Policy Mirror Descent Beyond Compatible Function Approximation
Uri Sherman, Tomer Koren, Yishay Mansour

TL;DR
This paper develops a new theoretical framework for policy mirror descent (PMD) that applies to general policy classes beyond compatible function approximation, using weaker assumptions and novel smoothness concepts.
Contribution
It introduces a variational gradient dominance condition and a local norm-based smoothness notion, extending PMD convergence analysis to broader policy classes.
Findings
Provides upper bounds on convergence rates for general policy classes.
Recasts PMD as smooth non-convex optimization in non-Euclidean space.
Extends theoretical guarantees beyond tabular and strongly closed policy classes.
Abstract
Modern policy optimization methods roughly follow the policy mirror descent (PMD) algorithmic template, for which there are by now numerous theoretical convergence results. However, most of these either target tabular environments, or can be applied effectively only when the class of policies being optimized over satisfies strong closure conditions, which is typically not the case when working with parametric policy classes in large-scale environments. In this work, we develop a theoretical framework for PMD for general policy classes where we replace the closure conditions with a strictly weaker variational gradient dominance assumption, and obtain upper bounds on the rate of convergence to the best-in-class policy. Our main result leverages a novel notion of smoothness with respect to a local norm induced by the occupancy measure of the current policy, and casts PMD as a particular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic Policies and Impacts
