B$^3$RTDP: A Belief Branch and Bound Real-Time Dynamic Programming Approach to Solving POMDPs
Sigurdur Orn Adalgeirsson, Cynthia Breazeal

TL;DR
This paper introduces B$^3$RTDP, an advanced algorithm for solving POMDPs that improves efficiency by using belief bounding techniques and convergence insights, outperforming existing methods like SARSOP.
Contribution
The paper presents B$^3$RTDP, a novel extension of RTDP-Bel that incorporates belief bounding and convergence frontier techniques for more efficient POMDP solving.
Findings
B$^3$RTDP achieves higher returns than SARSOP on benchmark problems.
The algorithm converges faster, reducing computational time.
Empirical results demonstrate improved efficiency over prior methods.
Abstract
Partially Observable Markov Decision Processes (POMDPs) offer a promising world representation for autonomous agents, as they can model both transitional and perceptual uncertainties. Calculating the optimal solution to POMDP problems can be computationally expensive as they require reasoning over the (possibly infinite) space of beliefs. Several approaches have been proposed to overcome this difficulty, such as discretizing the belief space, point-based belief sampling, and Monte Carlo tree search. The Real-Time Dynamic Programming approach of the RTDP-Bel algorithm approximates the value function by storing it in a hashtable with discretized belief keys. We propose an extension to the RTDP-Bel algorithm which we call Belief Branch and Bound RTDP (BRTDP). Our algorithm uses a bounded value function representation and takes advantage of this in two novel ways: a search-bounding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
