Inexact Policy Iteration Methods for Large-Scale Markov Decision   Processes

Matilde Gargiani; Robin Sieber; Efe Balta; Dominic; Liao-McPherson; John Lygeros

arXiv:2404.06136·math.OC·April 10, 2024·1 cites

Inexact Policy Iteration Methods for Large-Scale Markov Decision Processes

Matilde Gargiani, Robin Sieber, Efe Balta, Dominic, Liao-McPherson, John Lygeros

PDF

Open Access

TL;DR

This paper introduces inexact policy iteration methods for large-scale Markov decision processes, analyzing their convergence and performance with various iterative solvers, and demonstrating their effectiveness in epidemiological health policy design.

Contribution

The paper develops a general framework for inexact policy iteration using semismooth Newton-inspired stopping conditions, analyzing convergence and applying it to large-scale MDPs.

Findings

01

Contraction guarantees depend on the stopping condition parameter.

02

Iterative solvers' contraction properties are enhanced by problem structure.

03

Numerical experiments show improved performance in epidemiological health policy design.

Abstract

We consider inexact policy iteration methods for large-scale infinite-horizon discounted MDPs with finite spaces, a variant of policy iteration where the policy evaluation step is implemented inexactly using an iterative solver for linear systems. In the classical dynamic programming literature, a similar principle is deployed in optimistic policy iteration, where an a-priori fixed-number of iterations of value iteration is used to inexactly solve the policy evaluation step. Inspired by the connection between policy iteration and semismooth Newton's method, we investigate a class of iPI methods that mimic the inexact variants of semismooth Newton's method by adopting a parametric stopping condition to regulate the level of inexactness of the policy evaluation step. For this class of methods we discuss local and global convergence properties and derive a practical range of values for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAccess Control and Trust · Reinforcement Learning in Robotics · Simulation Techniques and Applications