Strengthening Deterministic Policies for POMDPs

Leonore Winterer; Ralf Wimmer; Nils Jansen; Bernd Becker

arXiv:2007.08351·cs.AI·July 20, 2020

Strengthening Deterministic Policies for POMDPs

Leonore Winterer, Ralf Wimmer, Nils Jansen, Bernd Becker

PDF

TL;DR

This paper introduces a novel MILP-based method to compute deterministic and randomized policies for POMDPs that can handle multiple temporal logic specifications, improving flexibility and safety in decision-making.

Contribution

The authors develop a new MILP encoding supporting complex specifications and extend it to produce randomized policies, enhancing the synthesis of policies for POMDPs.

Findings

01

Supports multiple temporal logic specifications.

02

Enables strengthening deterministic policies efficiently.

03

Demonstrates effectiveness on various benchmarks.

Abstract

The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that satisfies a given specification. Such policies have to take the full execution history of a POMDP into account, rendering the problem undecidable in general. A common approach is to use a limited amount of memory and randomize over potential choices. Yet, this problem is still NP-hard and often computationally intractable in practice. A restricted problem is to use neither history nor randomization, yielding policies that are called stationary and deterministic. Previous approaches to compute such policies employ mixed-integer linear programming (MILP). We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints. It is able to handle an arbitrary number of such specifications. Yet, randomization and memory are often…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.