Recursively-Constrained Partially Observable Markov Decision Processes

Qi Heng Ho; Tyler Becker; Benjamin Kraske; Zakariya Laouar; Martin S.; Feather; Federico Rossi; Morteza Lahijanian; Zachary N. Sunberg

arXiv:2310.09688·cs.AI·June 6, 2024·1 cites

Recursively-Constrained Partially Observable Markov Decision Processes

Qi Heng Ho, Tyler Becker, Benjamin Kraske, Zakariya Laouar, Martin S., Feather, Federico Rossi, Morteza Lahijanian, Zachary N. Sunberg

PDF

Open Access

TL;DR

This paper introduces RC-POMDPs, a new framework that addresses limitations of C-POMDPs by ensuring optimal policies obey Bellman's principle, enabling more reliable decision-making in constrained, partially observable environments.

Contribution

The paper proposes RC-POMDPs, a novel model that guarantees deterministic optimal policies and Bellman's principle, overcoming key issues in C-POMDPs.

Findings

01

RC-POMDPs always have deterministic optimal policies.

02

Policies for RC-POMDPs exhibit more desirable behaviors.

03

The proposed algorithm performs effectively on benchmark problems.

Abstract

Many sequential decision problems involve optimizing one objective function while imposing constraints on other objectives. Constrained Partially Observable Markov Decision Processes (C-POMDP) model this case with transition uncertainty and partial observability. In this work, we first show that C-POMDPs violate the optimal substructure property over successive decision steps and thus may exhibit behaviors that are undesirable for some (e.g., safety critical) applications. Additionally, online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation. To address these drawbacks, we introduce the Recursively-Constrained POMDP (RC-POMDP), which imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies and that optimal policies obey Bellman's principle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Transportation and Mobility Innovations