Howard's Policy Iteration is Subexponential for Deterministic Markov   Decision Problems with Rewards of Fixed Bit-size and Arbitrary Discount   Factor

Dibyangshu Mukherjee; Shivaram Kalyanakrishnan

arXiv:2505.00795·cs.AI·May 5, 2025

Howard's Policy Iteration is Subexponential for Deterministic Markov Decision Problems with Rewards of Fixed Bit-size and Arbitrary Discount Factor

Dibyangshu Mukherjee, Shivaram Kalyanakrishnan

PDF

Open Access

TL;DR

This paper proves that Howard's Policy Iteration algorithm has a subexponential upper bound on its running time for deterministic Markov Decision Problems with fixed-bit rewards, improving upon the previously known exponential bounds.

Contribution

The paper establishes a subexponential upper bound for Howard's Policy Iteration on deterministic MDPs with fixed-bit rewards, independent of the discount factor.

Findings

01

HPI has a subexponential upper bound on DMDPs with fixed-bit rewards.

02

The bound applies even when rewards are of arbitrary size with only two possible values.

03

The result improves the understanding of HPI's complexity on deterministic MDPs.

Abstract

Howard's Policy Iteration (HPI) is a classic algorithm for solving Markov Decision Problems (MDPs). HPI uses a "greedy" switching rule to update from any non-optimal policy to a dominating one, iterating until an optimal policy is found. Despite its introduction over 60 years ago, the best-known upper bounds on HPI's running time remain exponential in the number of states -- indeed even on the restricted class of MDPs with only deterministic transitions (DMDPs). Meanwhile, the tightest lower bound for HPI for MDPs with a constant number of actions per state is only linear. In this paper, we report a significant improvement: a subexponential upper bound for HPI on DMDPs, which is parameterised by the bit-size of the rewards, while independent of the discount factor. The same upper bound also applies to DMDPs with only two possible rewards (which may be of arbitrary size).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications · Supply Chain and Inventory Management · Optimization and Search Problems