Computational Hardness of Static Distributionally Robust Markov Decision Processes
Yan Li

TL;DR
This paper demonstrates the computational difficulty of finding optimal policies in static distributionally robust Markov decision processes, proving NP-hardness under various policy classes.
Contribution
It establishes NP-hardness results for static DRMDPs with simple ambiguity sets, highlighting fundamental computational challenges.
Findings
Optimal policy computation is NP-hard for non-randomized Markovian policies.
NP-hardness persists even with only two transition kernels in the ambiguity set.
The robust value function can have sub-optimal strict local minimizers.
Abstract
We present some hardness results on finding the optimal policy for the static formulation of distributionally robust Markov decision processes. We construct problem instances such that when the considered policy class is Markovian and non-randomized, finding the optimal policy is NP-hard. When the considered policy class is Markovian and randomized, the robust value function possesses sub-optimal strict local minimizers, and finding the optimal policy is also NP-hard. The considered instances involve an ambiguity set with only two transition kernels.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
