Recursively-Constrained Partially Observable Markov Decision Processes
Qi Heng Ho, Tyler Becker, Benjamin Kraske, Zakariya Laouar, Martin S., Feather, Federico Rossi, Morteza Lahijanian, Zachary N. Sunberg

TL;DR
This paper introduces RC-POMDPs, a new framework that addresses limitations of C-POMDPs by ensuring optimal policies obey Bellman's principle, enabling more reliable decision-making in constrained, partially observable environments.
Contribution
The paper proposes RC-POMDPs, a novel model that guarantees deterministic optimal policies and Bellman's principle, overcoming key issues in C-POMDPs.
Findings
RC-POMDPs always have deterministic optimal policies.
Policies for RC-POMDPs exhibit more desirable behaviors.
The proposed algorithm performs effectively on benchmark problems.
Abstract
Many sequential decision problems involve optimizing one objective function while imposing constraints on other objectives. Constrained Partially Observable Markov Decision Processes (C-POMDP) model this case with transition uncertainty and partial observability. In this work, we first show that C-POMDPs violate the optimal substructure property over successive decision steps and thus may exhibit behaviors that are undesirable for some (e.g., safety critical) applications. Additionally, online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation. To address these drawbacks, we introduce the Recursively-Constrained POMDP (RC-POMDP), which imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies and that optimal policies obey Bellman's principle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Transportation and Mobility Innovations
