Under-Approximating Expected Total Rewards in POMDPs
Alexander Bork, Joost-Pieter Katoen, Tim Quatmann

TL;DR
This paper introduces methods to compute under-approximations of the expected total reward in POMDPs, enabling the assessment of whether the reward is below a threshold despite the problem's undecidability.
Contribution
It proposes two novel techniques, cut-off and belief clipping, combined with MILP, to efficiently compute tight lower bounds on expected rewards in POMDPs.
Findings
Techniques scale well in experiments.
Provide tight lower bounds on total rewards.
Effective in threshold verification tasks.
Abstract
We consider the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold? We tackle this -- generally undecidable -- problem by computing under-approximations on these total expected rewards. This is done by abstracting finite unfoldings of the infinite belief MDP of the POMDP. The key issue is to find a suitable under-approximation of the value function. We provide two techniques: a simple (cut-off) technique that uses a good policy on the POMDP, and a more advanced technique (belief clipping) that uses minimal shifts of probabilities between beliefs. We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well while providing tight lower bounds on the expected total reward.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
