Monte Carlo Tree Search for Verifying Reachability in Markov Decision Processes
Pranav Ashok, Tom\'a\v{s} Br\'azdil, Jan K\v{r}et\'insk\'y and, Ond\v{r}ej Sl\'ame\v{c}ka

TL;DR
This paper introduces Monte Carlo tree search (MCTS) based heuristics for verifying reachability in Markov decision processes, offering a scalable alternative to value iteration and BRTDP for large models.
Contribution
It presents a new class of MCTS-based heuristics for reachability verification, extending existing simulation-based methods with broader applicability and comparable efficiency.
Findings
MCTS-based heuristics outperform BRTDP on large models.
The proposed algorithms are more scalable than traditional value iteration.
The methods maintain guarantees while reducing computational overhead.
Abstract
The maximum reachability probabilities in a Markov decision process can be computed using value iteration (VI). Recently, simulation-based heuristic extensions of VI have been introduced, such as bounded real-time dynamic programming (BRTDP), which often manage to avoid explicit analysis of the whole state space while preserving guarantees on the computed result. In this paper, we introduce a new class of such heuristics, based on Monte Carlo tree search (MCTS), a technique celebrated in various machine-learning settings. We provide a spectrum of algorithms ranging from MCTS to BRTDP. We evaluate these techniques and show that for larger examples, where VI is no more applicable, our techniques are more broadly applicable than BRTDP with only a minor additional overhead.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Advanced Software Engineering Methodologies · Software Reliability and Analysis Research
