Leveraging the Value of Information in POMDP Planning
Zakariya Laouar, Qi Heng Ho, Zachary Sunberg

TL;DR
This paper introduces a novel Monte Carlo Tree Search algorithm, VOIMCP, that efficiently plans in POMDPs by selectively ignoring low-value observations, improving performance under limited computation.
Contribution
It presents a dynamic programming framework for VOI-aware planning and a new algorithm, VOIMCP, with theoretical guarantees and empirical improvements.
Findings
VOIMCP outperforms baseline methods on POMDP benchmarks.
The framework provides theoretical near-optimality guarantees.
Selective observation processing reduces computational effort.
Abstract
Partially observable Markov decision processes (POMDPs) offer a principled formalism for planning under state and transition uncertainty. Despite advances made towards solving large POMDPs, obtaining performant policies under limited planning time remains a major challenge due to the curse of dimensionality and the curse of history. For many POMDP problems, the value of information (VOI) - the expected performance gain from reasoning about observations - varies over the belief space. We introduce a dynamic programming framework that exploits this structure by conditionally processing observations based on the value of information at each belief. Building on this framework, we propose Value of Information Monte Carlo planning (VOIMCP), a Monte Carlo Tree Search algorithm that allocates computational effort more efficiently by selectively disregarding observation information when the VOI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
