What should be observed for optimal reward in POMDPs?
Alyzia-Maria Konsta, Alberto Lluch Lafuente, Christoph Matheja

TL;DR
This paper investigates how to optimally select sensors in POMDPs within a budget to control the expected reward, introducing the novel OOP problem and providing algorithms for its decidable cases.
Contribution
It formulates the optimal observability problem (OOP) in POMDPs, proves its undecidability in general, and offers algorithms for the decidable fragment based on MDP strategies and SMT.
Findings
OOP is undecidable in general
Decidable when restricting to positional strategies
Algorithms show promising results on typical POMDP examples
Abstract
Partially observable Markov Decision Processes (POMDPs) are a standard model for agents making decisions in uncertain environments. Most work on POMDPs focuses on synthesizing strategies based on the available capabilities. However, system designers can often control an agent's observation capabilities, e.g. by placing or selecting sensors. This raises the question of how one should select an agent's sensors cost-effectively such that it achieves the desired goals. In this paper, we study the novel optimal observability problem OOP: Given a POMDP M, how should one change M's observation capabilities within a fixed budget such that its (minimal) expected reward remains below a given threshold? We show that the problem is undecidable in general and decidable when considering positional strategies only. We present two algorithms for a decidable fragment of the OOP: one based on optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Bayesian Modeling and Causal Inference
