Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action

Xin Chen; Yifan Hu; Minda Zhao

arXiv:2409.17138·math.OC·March 10, 2026

Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action

Xin Chen, Yifan Hu, Minda Zhao

PDF

Open Access

TL;DR

This paper establishes a theoretical foundation for policy gradient methods in finite-horizon MDPs with general state and action spaces, proving global convergence under certain conditions and providing sample complexity guarantees.

Contribution

It introduces the P{ extL}K condition for finite-horizon MDPs, enabling the first sample complexity guarantees for complex stochastic control models.

Findings

01

Policy gradient methods converge globally under the P{ extL}K condition.

02

Sample complexity is $ ilde{O}(rac{1}{ extepsilon})$ for $ extepsilon$-optimal policies.

03

Numerical experiments show superior performance over benchmark algorithms.

Abstract

Policy gradient methods are widely used in reinforcement learning. Yet, the nonconvexity of policy optimization poses significant challenges in understanding the global convergence of policy gradient methods. For a class of finite-horizon Markov Decision Processes (MDPs) with general state and action spaces, we identify a set of structural properties to establish a benign nonconvex landscape, the Polyak-{\L}ojasiewicz-Kurdyka (P{\L}K) condition of the policy optimization. Leveraging the P{\L}K condition, policy gradient methods converge to the globally optimal policy with a non-asymptotic rate despite nonconvexity. Our results apply to various control and operations models, including entropy-regularized tabular MDPs, Linear Quadratic Regulator problems, and both stochastic inventory models and stochastic cash balance problems with strongly convex costs. In these models, stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications

MethodsSparse Evolutionary Training