Distribution-based objectives for Markov Decision Processes
S. Akshay, Blaise Genest, Nikhil Vyas

TL;DR
This paper investigates distribution-based safety objectives in Markov Decision Processes, establishing decidability and complexity bounds for existential and universal safety, and comparing these with probabilistic automata.
Contribution
It introduces and analyzes the complexity of safety problems for distribution-based objectives in MDPs and PFAs, revealing decidability results and tight bounds.
Findings
Existential safety in MDPs is PTIME-complete.
Universal safety in MDPs is co-NP-complete.
Existential safety in PFAs is undecidable, while universal safety is in EXPTIME.
Abstract
We consider distribution-based objectives for Markov Decision Processes (MDP). This class of objectives gives rise to an interesting trade-off between full and partial information. As in full observation, the strategy in the MDP can depend on the state of the system, but similar to partial information, the strategy needs to account for all the states at the same time. In this paper, we focus on two safety problems that arise naturally in this context, namely, existential and universal safety. Given an MDP A and a closed and convex polytope H of probability distributions over the states of A, the existential safety problem asks whether there exists some distribution d in H and a strategy of A, such that starting from d and repeatedly applying this strategy keeps the distribution forever in H. The universal safety problem asks whether for all distributions in H, there exists such a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Advanced Software Engineering Methodologies · Software Reliability and Analysis Research
