Lower Bound on Howard Policy Iteration for Deterministic Markov Decision Processes

Ali Asadi; Krishnendu Chatterjee; Jakob de Raaij

arXiv:2506.12254·cs.AI·June 17, 2025

Lower Bound on Howard Policy Iteration for Deterministic Markov Decision Processes

Ali Asadi, Krishnendu Chatterjee, Jakob de Raaij

PDF

Open Access

TL;DR

This paper establishes a new linear lower bound on the number of iterations Howard's policy iteration algorithm requires for solving deterministic Markov decision processes with mean-payoff objectives, highlighting its potential computational complexity.

Contribution

The paper provides the first linear lower bound on Howard's policy iteration for DMDPs, improving upon the previous sub-linear bounds and deepening understanding of its computational limits.

Findings

01

Howard's algorithm requires at least a linear number of iterations in the worst case.

02

Previous lower bounds were sub-linear, now improved to linear.

03

The result emphasizes potential exponential complexity in practical scenarios.

Abstract

Deterministic Markov Decision Processes (DMDPs) are a mathematical framework for decision-making where the outcomes and future possible actions are deterministically determined by the current action taken. DMDPs can be viewed as a finite directed weighted graph, where in each step, the controller chooses an outgoing edge. An objective is a measurable function on runs (or infinite trajectories) of the DMDP, and the value for an objective is the maximal cumulative reward (or weight) that the controller can guarantee. We consider the classical mean-payoff (aka limit-average) objective, which is a basic and fundamental objective. Howard's policy iteration algorithm is a popular method for solving DMDPs with mean-payoff objectives. Although Howard's algorithm performs well in practice, as experimental studies suggested, the best known upper bound is exponential and the current known lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Reliability and Analysis Research · Simulation Techniques and Applications · Reinforcement Learning in Robotics