Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs
Ali Asadi, Krishnendu Chatterjee, Ehsan Goharshady, Mehrdad Karrabi, Alipasha Montaseri, Carlo Pagano

TL;DR
This paper proves that a robust policy iteration algorithm operates in strongly-polynomial time for a specific class of robust Markov decision processes with fixed discount factors, advancing the understanding of their computational complexity.
Contribution
It establishes the first strongly-polynomial time complexity result for policy iteration in $L_$ robust MDPs with fixed discount factors, solving a key open problem.
Findings
Robust policy iteration runs in strongly-polynomial time.
The result applies to $(s, a)$-rectangular $L_$ RMDPs with fixed discount factors.
This advances the computational understanding of robust MDPs.
Abstract
Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, -rectangular RMDPs with uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly--polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem. In this work, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Risk and Portfolio Optimization
