Enforcing Almost-Sure Reachability in POMDPs

Sebastian Junges; Nils Jansen; Sanjit A. Seshia

arXiv:2007.00085·cs.AI·March 22, 2021

Enforcing Almost-Sure Reachability in POMDPs

Sebastian Junges, Nils Jansen, Sanjit A. Seshia

PDF

1 Repo

TL;DR

This paper addresses the complex problem of synthesizing policies in POMDPs that guarantee reaching a goal state almost surely while avoiding bad states, introducing two novel algorithms and demonstrating their effectiveness.

Contribution

The paper introduces two new algorithms—a SAT-based iterative method and a decision-diagram approach—for computing winning regions in POMDPs with almost-sure reachability.

Findings

01

Algorithms are feasible and effective in practice.

02

Empirical evaluation confirms the approaches' efficiency.

03

Provides tools for safe exploration in POMDPs.

Abstract

Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sjunges/shielding-POMDPs
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.