Heuristics for Partially Observable Stochastic Contingent Planning
Guy Shani

TL;DR
This paper introduces a new heuristic for goal-based POMDPs that improves the efficiency of the RTDP-BEL algorithm by reducing the number of trajectories needed for convergence, especially in information-gathering tasks.
Contribution
The paper develops a structured heuristic function that guides RTDP-BEL more effectively by considering information value and stochastic effects, leading to faster convergence.
Findings
Heuristic reduces the number of trajectories by an order of magnitude.
Heuristic is slower to compute but accelerates convergence.
Significant improvements in problems requiring information gathering.
Abstract
Acting to complete tasks in stochastic partially observable domains is an important problem in artificial intelligence, and is often formulated as a goal-based POMDP. Goal-based POMDPs can be solved using the RTDP-BEL algorithm, that operates by running forward trajectories from the initial belief to the goal. These trajectories can be guided by a heuristic, and more accurate heuristics can result in significantly faster convergence. In this paper, we develop a heuristic function that leverages the structured representation of domain models. We compute, in a relaxed space, a plan to achieve the goal, while taking into account the value of information, as well as the stochastic effects. We provide experiments showing that while our heuristic is slower to compute, it requires an order of magnitude less trajectories before convergence. Overall, it thus speeds up RTDP-BEL, particularly in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning
