Partially Observable Monte-Carlo Graph Search
Yang You, Vincent Thomas, Alex Schutz, Robert Skilton, Nick Hawes, Olivier Buffet

TL;DR
This paper introduces POMCGS, a novel offline sampling-based algorithm that constructs policy graphs for large POMDPs, enabling scalable, analyzable solutions that outperform previous offline methods and compete with online algorithms.
Contribution
POMCGS is the first offline algorithm to efficiently generate policies for large POMDPs by folding search trees into policy graphs, incorporating action widening and observation clustering.
Findings
POMCGS can solve large, challenging POMDPs previously unsolvable offline.
Generated policies are competitive with state-of-the-art online algorithms.
The method effectively handles continuous POMDPs with proposed techniques.
Abstract
Currently, large partially observable Markov decision processes (POMDPs) are often solved by sampling-based online methods which interleave planning and execution phases. However, a pre-computed offline policy is more desirable in POMDP applications with time or energy constraints. But previous offline algorithms are not able to scale up to large POMDPs. In this article, we propose a new sampling-based algorithm, the partially observable Monte-Carlo graph search (POMCGS) to solve large POMDPs offline. Different from many online POMDP methods, which progressively develop a tree while performing (Monte-Carlo) simulations, POMCGS folds this search tree on the fly to construct a policy graph, so that computations can be drastically reduced, and users can analyze and validate the policy prior to embedding and executing it. Moreover, POMCGS, together with action progressive widening and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
