Multi-Environment MDPs with Prior and Universal Semantics
Benjamin Bordais, Jean-Fran\c{c}ois Raskin

TL;DR
This paper investigates multi-environment MDPs with prior and universal semantics, providing new algorithms for value approximation, clarifying their relationship, and connecting prior-MEMDPs to a tractable subclass of POMDPs.
Contribution
It introduces algorithms for approximating values in MEMDPs with prior semantics, clarifies the relation between prior and universal semantics, and links prior-MEMDPs to a tractable subclass of POMDPs.
Findings
Prior value approximation is space-efficient and decidable in PSPACE.
Universal value equals the infimum of prior values over beliefs.
Prior-MEMDPs form a tractable subclass of POMDPs with non-increasing belief entropy.
Abstract
Multiple-environment Markov decision processes (MEMDPs) equip an MDP with several probabilistic transition functions (one per possible environment) so that the state is observable but the environment is not. Previous work studies two semantics: (i) the universal semantics, where an adversary picks the environment; and (ii) the prior semantics, where the environment is drawn once before execution from a fixed distribution. We clarify the relation between these semantics. For parity objectives, we show that the qualitative questions, i.e. value one, coincide, and we develop a new algorithm for the general value of MEMDP with prior semantics. In particular, we show that the prior value of an MEMDP with a parity objective can be approximated to any precision with a space efficient algorithm; equivalently, the associated gap problem is decidable in PSPACE when probabilities are given in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Logic, Reasoning, and Knowledge · Reinforcement Learning in Robotics
