Shielding in Resource-Constrained Goal POMDPs
Michal Ajdar\'ow, \v{S}imon Brlej, Petr Novotn\'y

TL;DR
This paper introduces a method combining formal shielding techniques with POMCP heuristic search to effectively solve resource-constrained goal optimization problems in POMDPs, preventing resource exhaustion while minimizing costs.
Contribution
It presents a novel two-step approach: designing shields via formal methods and integrating them with POMCP for resource-aware POMDP planning.
Findings
The combined algorithm successfully prevents resource exhaustion in benchmark scenarios.
The approach improves planning efficiency in resource-constrained POMDPs.
Experimental results demonstrate the method's applicability and effectiveness.
Abstract
We consider partially observable Markov decision processes (POMDPs) modeling an agent that needs a supply of a certain resource (e.g., electricity stored in batteries) to operate correctly. The resource is consumed by agent's actions and can be replenished only in certain states. The agent aims to minimize the expected cost of reaching some goal while preventing resource exhaustion, a problem we call \emph{resource-constrained goal optimization} (RSGO). We take a two-step approach to the RSGO problem. First, using formal methods techniques, we design an algorithm computing a \emph{shield} for a given scenario: a procedure that observes the agent and prevents it from using actions that might eventually lead to resource exhaustion. Second, we augment the POMCP heuristic search algorithm for POMDP planning with our shields to obtain an algorithm solving the RSGO problem. We implement our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Bayesian Modeling and Causal Inference · Formal Methods in Verification
