Rollout Heuristics for Online Stochastic Contingent Planning
Oded Blumenthal, Guy Shani

TL;DR
This paper enhances online stochastic contingent planning for POMDPs by integrating domain-independent heuristics into Monte-Carlo planning, improving decision-making efficiency without relying on domain-specific heuristics.
Contribution
It introduces two novel heuristics for POMCP, leveraging classical planning heuristics and belief space analysis, to improve rollout quality in stochastic contingent planning.
Findings
Heuristics improve planning efficiency.
Belief space heuristic accounts for information value.
Domain-independent heuristics reduce reliance on domain-specific tuning.
Abstract
Partially observable Markov decision processes (POMDP) are a useful model for decision-making under partial observability and stochastic actions. Partially Observable Monte-Carlo Planning is an online algorithm for deciding on the next action to perform, using a Monte-Carlo tree search approach, based on the UCT (UCB applied to trees) algorithm for fully observable Markov-decision processes. POMCP develops an action-observation tree, and at the leaves, uses a rollout policy to provide a value estimate for the leaf. As such, POMCP is highly dependent on the rollout policy to compute good estimates, and hence identify good actions. Thus, many practitioners who use POMCP are required to create strong, domain-specific heuristics. In this paper, we model POMDPs as stochastic contingent planning problems. This allows us to leverage domain-independent heuristics that were developed in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMonte-Carlo Tree Search
