Finite-Time Analysis of MCTS in Continuous POMDP Planning
Da Kong, Vadim Indelman

TL;DR
This paper provides finite-time theoretical guarantees for Monte Carlo Tree Search in continuous POMDPs, introducing a new partitioning framework and a variant called Voro-POMCPOW with proven bounds.
Contribution
It extends finite-time analysis to continuous POMDPs by developing a Voronoi-based partitioning method and a new algorithm with theoretical guarantees.
Findings
Voro-POMCPOW achieves competitive empirical performance.
Polynomial concentration bounds are established for discrete POMDPs.
Finite-time bounds on partitioning loss are derived for continuous observation spaces.
Abstract
This paper presents a finite-time analysis for Monte Carlo Tree Search (MCTS) in Partially Observable Markov Decision Processes (POMDPs), with probabilistic concentration bounds in both discrete and continuous observation spaces. While MCTS-style solvers such as POMCP achieve empirical success in many applications, rigorous finite-time guarantees remain an open problem due to the nonstationarity and the interdependencies induced by heuristic action selection (e.g., UCB). In the discrete setting, we address these challenges by extending the polynomial exploration bonus to UCB in POMDP setting, yielding polynomial concentration bounds for the empirical value estimation at the root node. For continuous observation spaces, we introduce an abstract partitioning framework and propose a finite-time bound on partitioning loss. Under mild conditions, we prove highprobability bound on value…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
