Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy   Gradient Methods

Sara Klein; Simon Weissmann; Leif D\"oring

arXiv:2310.02671·math.OC·May 7, 2024·1 cites

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods

Sara Klein, Simon Weissmann, Leif D\"oring

PDF

Open Access 1 Video

TL;DR

This paper introduces a dynamic policy gradient method for finite-horizon MDPs that trains parameters backwards in time, improving convergence analysis and exploiting problem structure better than traditional methods.

Contribution

It proposes a novel dynamic policy gradient approach that incorporates backward training in finite-horizon MDPs and provides convergence analysis for this method.

Findings

01

Dynamic policy gradient outperforms standard methods in convergence speed.

02

The approach better exploits the structure of finite-time horizon problems.

03

Convergence bounds are improved using dynamic training.

Abstract

Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems, but also in the training of large language models. In contrast to infinite horizon MDPs optimal policies are not stationary, policies must be learned for every single epoch. In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming. This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time. For the tabular softmax parametrisation we carry out the convergence analysis for simultaneous and dynamic policy gradient towards global optima, both in the exact and sampled gradient settings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Optimization and Search Problems · Reinforcement Learning in Robotics

MethodsSoftmax