Monte Carlo Information-Oriented Planning
Vincent Thomas, G\'er\'emy Hutin, Olivier Buffet

TL;DR
This paper introduces a Monte Carlo Tree Search method for solving rho-POMDPs, enabling efficient online planning for belief-dependent rewards with proven convergence and superior performance over myopic strategies.
Contribution
It extends POMCP to rho-POMDPs, allowing belief-dependent reward optimization with convergence guarantees and practical efficiency.
Findings
The proposed algorithm outperforms myopic approaches in experiments.
It can handle any continuous rho function in rho-POMDPs.
Convergence to epsilon-optimal solutions is theoretically established.
Abstract
In this article, we discuss how to solve information-gathering problems expressed as rho-POMDPs, an extension of Partially Observable Markov Decision Processes (POMDPs) whose reward rho depends on the belief state. Point-based approaches used for solving POMDPs have been extended to solving rho-POMDPs as belief MDPs when its reward rho is convex in B or when it is Lipschitz-continuous. In the present paper, we build on the POMCP algorithm to propose a Monte Carlo Tree Search for rho-POMDPs, aiming for an efficient on-line planner which can be used for any rho function. Adaptations are required due to the belief-dependent rewards to (i) propagate more than one state at a time, and (ii) prevent biases in value estimates. An asymptotic convergence proof to epsilon-optimal values is given when rho is continuous. Experiments are conducted to analyze the algorithms at hand and show that they…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
