Inexact GMRES Policy Iteration for Large-Scale Markov Decision Processes

Matilde Gargiani; Dominic Liao-McPherson; Andrea Zanelli; John Lygeros

arXiv:2211.04299·math.OC·November 9, 2022

Inexact GMRES Policy Iteration for Large-Scale Markov Decision Processes

Matilde Gargiani, Dominic Liao-McPherson, Andrea Zanelli, John Lygeros

PDF

TL;DR

This paper introduces inexact GMRES policy iteration, a novel method for large-scale Markov decision processes that improves computational efficiency while maintaining convergence guarantees.

Contribution

It proposes inexact policy iteration inspired by semismooth Newton methods and designs a GMRES-based approximation for large-scale MDPs.

Findings

01

Achieves significant speedups over traditional policy iteration.

02

Demonstrates practical efficiency on an MDP with 10,000 states.

03

Maintains local contraction guarantees.

Abstract

Policy iteration enjoys a local quadratic rate of contraction, but its iterations are computationally expensive for Markov decision processes (MDPs) with a large number of states. In light of the connection between policy iteration and the semismooth Newton method and taking inspiration from the inexact variants of the latter, we propose \textit{inexact policy iteration}, a new class of methods for large-scale finite MDPs with local contraction guarantees. We then design an instance based on the deployment of GMRES for the approximate policy evaluation step, which we call inexact GMRES policy iteration. Finally, we demonstrate the superior practical performance of inexact GMRES policy iteration on an MDP with 10000 states, where it achieves a $\times 5.8$ and $\times 2.2$ speedup with respect to policy iteration and optimistic policy iteration, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.