Synthesizing POMDP Policies: Sampling Meets Model-checking via Learning
Debraj Chakraborty, Anirban Majumdar, Prince Mathew, Sayan Mukherjee, Jean-Fran\c{c}ois Raskin

TL;DR
This paper introduces a novel synthesis framework for POMDP policies that combines sampling, automata learning, and model-checking to produce formally guaranteed finite-state controllers.
Contribution
It integrates sampling with automata learning and model-checking, inspired by Angluin's $L^*$ algorithm, to synthesize POMDP policies with formal correctness guarantees.
Findings
Successfully solves threshold-safety problems challenging for existing tools.
Establishes a relative completeness result for the proposed framework.
Prototype implementation demonstrates practical effectiveness.
Abstract
Partially Observable Markov Decision Processes (POMDPs) are the standard framework for decision-making under uncertainty. While sampling-based methods scale well, they lack formal correctness guarantees, making them unsuitable for safety-critical applications. Conversely, formal synthesis techniques provide correctness-by-construction but often struggle with scalability, as general POMDP synthesis is undecidable. To bridge this gap, we propose a synthesis framework that integrates sampling, automata learning, and model-checking. Inspired by Angluin's algorithm, our approach utilizes sampling as a membership oracle and model-checking as an equivalence oracle. This enables the synthesis of finite-state controllers with formal guarantees, provided the sampling-induced policy is regular. We establish a relative completeness result for this framework. Experimental results from our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
