Online Planning for Constrained POMDPs with Continuous Spaces through Dual Ascent
Arec Jamgochian, Anthony Corso, Mykel J. Kochenderfer

TL;DR
This paper introduces online planning algorithms for constrained POMDPs with continuous spaces, combining dual ascent and progressive widening, and demonstrates their effectiveness on safety-critical problems.
Contribution
It presents novel algorithms for online CPOMDP planning in continuous spaces, extending previous discrete-only methods with dual ascent and progressive widening techniques.
Findings
Algorithms outperform existing methods on toy and real-world safety-critical problems.
Dual ascent combined with progressive widening effectively handles continuous CPOMDPs.
Optimistic cost propagation influences planning performance.
Abstract
Rather than augmenting rewards with penalties for undesired behavior, Constrained Partially Observable Markov Decision Processes (CPOMDPs) plan safely by imposing inviolable hard constraint value budgets. Previous work performing online planning for CPOMDPs has only been applied to discrete action and observation spaces. In this work, we propose algorithms for online CPOMDP planning for continuous state, action, and observation spaces by combining dual ascent with progressive widening. We empirically compare the effectiveness of our proposed algorithms on continuous CPOMDPs that model both toy and real-world safety-critical problems. Additionally, we compare against the use of online solvers for continuous unconstrained POMDPs that scalarize cost constraints into rewards, and investigate the effect of optimistic cost propagation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Formal Methods in Verification · Reinforcement Learning in Robotics
