Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs

Ali Asadi; Krishnendu Chatterjee; Ehsan Goharshady; Mehrdad Karrabi; Alipasha Montaseri; Carlo Pagano

arXiv:2601.23229·cs.AI·February 2, 2026

Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs

Ali Asadi, Krishnendu Chatterjee, Ehsan Goharshady, Mehrdad Karrabi, Alipasha Montaseri, Carlo Pagano

PDF

Open Access

TL;DR

This paper proves that a robust policy iteration algorithm operates in strongly-polynomial time for a specific class of robust Markov decision processes with fixed discount factors, advancing the understanding of their computational complexity.

Contribution

It establishes the first strongly-polynomial time complexity result for policy iteration in $L_$ robust MDPs with fixed discount factors, solving a key open problem.

Findings

01

Robust policy iteration runs in strongly-polynomial time.

02

The result applies to $(s, a)$-rectangular $L_$ RMDPs with fixed discount factors.

03

This advances the computational understanding of robust MDPs.

Abstract

Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$ -rectangular RMDPs with $L_{\infty}$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly--polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem. In this work, we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Risk and Portfolio Optimization