Structure Matters: Dynamic Policy Gradient

Sara Klein; Xiangyuan Zhang; Tamer Ba\c{s}ar; Simon Weissmann; Leif; D\"oring

arXiv:2411.04913·cs.LG·November 8, 2024

Structure Matters: Dynamic Policy Gradient

Sara Klein, Xiangyuan Zhang, Tamer Ba\c{s}ar, Simon Weissmann, Leif, D\"oring

PDF

Open Access

TL;DR

This paper introduces dynamic policy gradient (DynPG), a novel framework that combines dynamic programming with policy gradient methods to efficiently solve infinite-horizon MDPs by dynamically adjusting the horizon during training.

Contribution

The paper proposes DynPG, which integrates dynamic programming with policy gradient methods, providing the first non-asymptotic convergence rate analysis for this approach in tabular MDPs.

Findings

01

DynPG converges to the optimal policy for infinite-horizon MDPs.

02

The convergence rate of DynPG scales polynomially with the effective horizon.

03

DynPG outperforms vanilla policy gradient in terms of convergence speed.

Abstract

In this work, we study $γ$ -discounted infinite-horizon tabular Markov decision processes (MDPs) and introduce a framework called dynamic policy gradient (DynPG). The framework directly integrates dynamic programming with (any) policy gradient method, explicitly leveraging the Markovian property of the environment. DynPG dynamically adjusts the problem horizon during training, decomposing the original infinite-horizon MDP into a sequence of contextual bandit problems. By iteratively solving these contextual bandits, DynPG converges to the stationary optimal policy of the infinite-horizon MDP. To demonstrate the power of DynPG, we establish its non-asymptotic global convergence rate under the tabular softmax parametrization, focusing on the dependencies on salient but essential parameters of the MDP. By combining classical arguments from dynamic programming with more recent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEconomic Policies and Impacts

MethodsSoftmax