Lightweight Monte Carlo Verification of Markov Decision Processes with Rewards
Axel Legay, Sean Sedwards, Louis-Marie Traonouez

TL;DR
This paper introduces lightweight statistical model checking algorithms to optimize rewards in Markov decision processes, providing scalable alternatives for quantitative analysis beyond probability optimization.
Contribution
It extends existing sampling techniques to reward optimization in MDPs, including an auxiliary hypothesis test for reachability rewards, enhancing scalability and applicability.
Findings
Effective on standard case studies
Outperforms traditional methods in scalability
Accurately estimates reward metrics
Abstract
Markov decision processes are useful models of concurrency optimisation problems, but are often intractable for exhaustive verification methods. Recent work has introduced lightweight approximative techniques that sample directly from scheduler space, bringing the prospect of scalable alternatives to standard numerical model checking algorithms. The focus so far has been on optimising the probability of a property, but many problems require quantitative analysis of rewards. In this work we therefore present lightweight statistical model checking algorithms to optimise the rewards of Markov decision processes. We consider the standard definitions of rewards used in model checking, introducing an auxiliary hypothesis test to accommodate reachability rewards. We demonstrate the performance of our approach on a number of standard case studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Formal Methods in Verification · Reinforcement Learning in Robotics
