Upper Bounds for All and Max-gain Policy Iteration Algorithms on   Deterministic MDPs

Ritesh Goenka; Eashan Gupta; Sushil Khyalia; Pratyush Agarwal; Mulinti; Shaik Wajid; Shivaram Kalyanakrishnan

arXiv:2211.15602·cs.DM·October 10, 2023

Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs

Ritesh Goenka, Eashan Gupta, Sushil Khyalia, Pratyush Agarwal, Mulinti, Shaik Wajid, Shivaram Kalyanakrishnan

PDF

Open Access

TL;DR

This paper establishes upper bounds on the running time of Policy Iteration algorithms, including max-gain variants, on deterministic MDPs, using graph-theoretic analysis to advance understanding of their efficiency.

Contribution

It provides the first non-trivial upper bounds for all PI algorithms on DMDPs and confirms a conjecture for Howard's PI in this setting.

Findings

01

Upper bounds for all PI algorithms on DMDPs

02

Upper bounds for max-gain switching variants

03

Confirmation of a conjecture for Howard's PI on DMDPs

Abstract

Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action pair has a unique next state. Our results include a non-trivial upper bound that applies to the entire family of PI algorithms; another to all "max-gain" switching variants; and affirmation that a conjecture regarding Howard's PI on MDPs is true for DMDPs. Our analysis is based on certain graph-theoretic results, which may be of independent interest.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFormal Methods in Verification