Revisiting Subgradient Dominance in Robust MDPs: Counterexamples, Hardness, and Sufficient Conditions
Toshinori Kitamura, Arnob Ghosh, Alex Ayoub, Thang D. Chu, Csaba Szepesv\'ari

TL;DR
This paper critically examines the assumptions behind projected subgradient descent in robust MDPs, revealing counterexamples, computational hardness, and conditions for convergence.
Contribution
It disproves previous claims of subgradient dominance in general RMDPs and identifies specific conditions where convergence is guaranteed.
Findings
Counterexamples show subgradient dominance does not hold generally.
Finding an ε-optimal policy is NP-hard in certain RMDP settings.
Unique worst-case transition or action-value functions ensure subgradient dominance.
Abstract
Projected subgradient descent (PSD) has gained popularity for solving robust Markov decision processes (RMDPs) because it applies to a broader class of uncertainty sets than traditional dynamic programming. Existing work claims that RMDPs with a general compact uncertainty set satisfy the subgradient dominance property, under which exact PSD converges to an -optimal policy in a polynomial number of updates (e.g., Wang et al., 2023). We show that these claims are incorrect. Even when the uncertainty set has cardinality two, the RMDP objective is not subgradient-dominant and can admit suboptimal strict local minima. Moreover, we prove that finding an -optimal policy can be NP-hard even in settings where subgradients are efficiently computable: (i) finite transition uncertainty sets and (ii) -rectangular finite transition uncertainty sets with finite cost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
