Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking
Linus Heck, Filip Mac\'ak, Milan \v{C}e\v{s}ka, Sebastian Junges

TL;DR
This paper introduces a novel framework for synthesizing reward-optimal policies for finite MDPs that are both robust to perturbations and satisfy complex structural constraints, leveraging satisfiability solvers and probabilistic model checking.
Contribution
It presents the first effective method to compute robust, constrained policies for MDPs using a flexible first-order constraint formulation integrated with satisfiability and probabilistic model checking.
Findings
Demonstrates feasibility on hundreds of benchmarks.
Achieves competitiveness with state-of-the-art methods.
Handles arbitrary structural constraints efficiently.
Abstract
The ability to compute reward-optimal policies for given and known finite Markov decision processes (MDPs) underpins a variety of applications across planning, controller synthesis, and verification. However, we often want policies (1) to be robust, i.e., they perform well on perturbations of the MDP and (2) to satisfy additional structural constraints regarding, e.g., their representation or implementation cost. Computing such robust and constrained policies is indeed computationally more challenging. This paper contributes the first approach to effectively compute robust policies subject to arbitrary structural constraints using a flexible and efficient framework. We achieve flexibility by allowing to express our constraints in a first-order theory over a set of MDPs, while the root for our efficiency lies in the tight integration of satisfiability solvers to handle the combinatorial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFormal Methods in Verification · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
