Global Optimality Guarantees For Policy Gradient Methods
Jalaj Bhandari, Daniel Russo

TL;DR
This paper establishes structural conditions under which policy gradient methods in control problems are guaranteed to find globally optimal solutions, despite the inherent non-convexity of the optimization landscape.
Contribution
It identifies specific structural properties that ensure no suboptimal stationary points exist and provides convergence guarantees under strengthened conditions.
Findings
Policy gradient objectives can be globally optimal under certain structural conditions.
The paper proves convergence rates when the Polyak-Lojasiewicz condition holds.
Bounds on the optimality gap are provided when conditions are relaxed.
Abstract
Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by standard dynamic programming techniques, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to a stationary point. This work identifies structural properties -- shared by several classic control problems -- that ensure the policy gradient objective function has no suboptimal stationary points despite being non-convex. When these conditions are strengthened, this objective satisfies a Polyak-lojasiewicz (gradient dominance) condition that yields convergence rates. We also provide bounds on the optimality gap of any stationary point when some of these conditions are relaxed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Statistical Methods and Inference
