Strongly-polynomial time and validation analysis of policy gradient methods

Caleb Ju; Guanghui Lan

arXiv:2409.19437·cs.LG·March 24, 2026

Strongly-polynomial time and validation analysis of policy gradient methods

Caleb Ju, Guanghui Lan

PDF

Open Access

TL;DR

This paper introduces a new advantage gap function that enables policy gradient methods to achieve strongly-polynomial convergence in MDPs and provides a practical way to validate RL solutions with certificates of optimality.

Contribution

It presents the advantage gap function and demonstrates its role in ensuring strong convergence and validation of policy gradient methods in reinforcement learning.

Findings

01

Policy gradient methods can solve MDPs in strongly-polynomial time.

02

The advantage gap function provides close approximations of the optimality gap.

03

It offers a practical, computable measure of optimality for RL solutions.

Abstract

This paper proposes a novel termination criterion, termed the advantage gap function, for finite state and action Markov decision processes (MDP) and reinforcement learning (RL). By incorporating this advantage gap function into the design of step size rules and deriving a new linear rate of convergence that is independent of the stationary state distribution of the optimal policy, we demonstrate that policy gradient methods can solve MDPs in strongly-polynomial time. To the best of our knowledge, this is the first time that such strong convergence properties have been established for policy gradient methods. Moreover, in the stochastic setting, where only stochastic estimates of policy gradients are available, we show that the advantage gap function provides close approximations of the optimality gap for each individual state and exhibits a sublinear rate of convergence at every state.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Optimization and Search Problems · Simulation Techniques and Applications