A Gauss-Newton Method for Markov Decision Processes

Thomas Furmston; Guy Lever

arXiv:1507.08271·cs.AI·August 7, 2015

A Gauss-Newton Method for Markov Decision Processes

Thomas Furmston, Guy Lever

PDF

Open Access

TL;DR

This paper introduces two Gauss-Newton methods for policy optimization in Markov Decision Processes, leveraging Hessian structure to improve convergence and performance over existing algorithms.

Contribution

The paper develops and analyzes two Gauss-Newton algorithms for MDP policy optimization, providing theoretical guarantees and demonstrating practical improvements.

Findings

01

The Hessian in MDPs has useful structure similar to the gradient.

02

The proposed methods guarantee ascent directions and convergence.

03

The second Gauss-Newton method outperforms related algorithms in experiments.

Abstract

Approximate Newton methods are a standard optimization tool which aim to maintain the benefits of Newton's method, such as a fast rate of convergence, whilst alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov Decision Processes (MDPs). We first analyse the structure of the Hessian of the objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton Methods for MDPs. Like the Gauss-Newton method for non-linear least squares, these methods involve approximating the Hessian by ignoring certain terms in the Hessian which are difficult to estimate. The approximate Hessians possess desirable properties, such as negative definiteness,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research