A Gauss-Newton Method for Markov Decision Processes
Thomas Furmston, Guy Lever

TL;DR
This paper introduces two Gauss-Newton methods for policy optimization in Markov Decision Processes, leveraging Hessian structure to improve convergence and performance over existing algorithms.
Contribution
The paper develops and analyzes two Gauss-Newton algorithms for MDP policy optimization, providing theoretical guarantees and demonstrating practical improvements.
Findings
The Hessian in MDPs has useful structure similar to the gradient.
The proposed methods guarantee ascent directions and convergence.
The second Gauss-Newton method outperforms related algorithms in experiments.
Abstract
Approximate Newton methods are a standard optimization tool which aim to maintain the benefits of Newton's method, such as a fast rate of convergence, whilst alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov Decision Processes (MDPs). We first analyse the structure of the Hessian of the objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton Methods for MDPs. Like the Gauss-Newton method for non-linear least squares, these methods involve approximating the Hessian by ignoring certain terms in the Hessian which are difficult to estimate. The approximate Hessians possess desirable properties, such as negative definiteness,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
