On the Complexity of Discounted Robust MDPs with $L_p$ Uncertainty Sets
Ali Asadi, Krishnendu Chatterjee, Alipasha Montaseri, Ali Shafiee

TL;DR
This paper investigates the computational complexity of solving discounted robust Markov decision processes with $L_p$ uncertainty sets, providing new polynomial-time algorithms for some cases and hardness results for others.
Contribution
It introduces strongly polynomial algorithms for RMDPs with $L_1$ and $L_ extinfty$ uncertainty sets and proves hardness for intermediate $L_p$ sets, advancing theoretical understanding.
Findings
Policy iteration is strongly polynomial for compact uncertainty sets with access to robust Markov chain solutions.
Efficient bounds are established for $L_1$ and $L_ extinfty$ uncertainty sets.
Hardness results are proved for $L_p$ uncertainty sets with $1<p< extinfty$.
Abstract
A basic model in sequential decision making is the Markov decision process (MDP), which is extended to Robust MDPs (RMDPs) by allowing uncertainty in transition probabilities and optimizing against the worst-case transition probabilities from the uncertainty sets. The class of -rectangular RMDPs with uncertainty sets provides a flexible and expressive model for such problems. We study this class of RMDPs with a discounted-sum cost criterion and a constant discount factor. The existence of an efficient algorithm for this class is a fundamental theoretical question in optimization and sequential decision making. Previous results only establish a strongly polynomial-time algorithm for uncertainty sets. In this work, our main results are as follows: (a)~we show that for any compact uncertainty set, the policy iteration algorithm for RMDPs is strongly polynomial with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
