Structure Matters: Dynamic Policy Gradient
Sara Klein, Xiangyuan Zhang, Tamer Ba\c{s}ar, Simon Weissmann, Leif, D\"oring

TL;DR
This paper introduces dynamic policy gradient (DynPG), a novel framework that combines dynamic programming with policy gradient methods to efficiently solve infinite-horizon MDPs by dynamically adjusting the horizon during training.
Contribution
The paper proposes DynPG, which integrates dynamic programming with policy gradient methods, providing the first non-asymptotic convergence rate analysis for this approach in tabular MDPs.
Findings
DynPG converges to the optimal policy for infinite-horizon MDPs.
The convergence rate of DynPG scales polynomially with the effective horizon.
DynPG outperforms vanilla policy gradient in terms of convergence speed.
Abstract
In this work, we study -discounted infinite-horizon tabular Markov decision processes (MDPs) and introduce a framework called dynamic policy gradient (DynPG). The framework directly integrates dynamic programming with (any) policy gradient method, explicitly leveraging the Markovian property of the environment. DynPG dynamically adjusts the problem horizon during training, decomposing the original infinite-horizon MDP into a sequence of contextual bandit problems. By iteratively solving these contextual bandits, DynPG converges to the stationary optimal policy of the infinite-horizon MDP. To demonstrate the power of DynPG, we establish its non-asymptotic global convergence rate under the tabular softmax parametrization, focusing on the dependencies on salient but essential parameters of the MDP. By combining classical arguments from dynamic programming with more recent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic Policies and Impacts
MethodsSoftmax
