POMDPs under Probabilistic Semantics

Krishnendu Chatterjee; Martin Chmelik

arXiv:1408.2058·cs.AI·August 12, 2014

POMDPs under Probabilistic Semantics

Krishnendu Chatterjee, Martin Chmelik

PDF

TL;DR

This paper studies POMDPs with long-run average rewards under path constraints, establishing complexity and decidability results for controller existence with qualitative and quantitative constraints.

Contribution

It provides the first complexity and decidability results for POMDPs with limit-average payoff and path constraints, highlighting the challenges in controller synthesis.

Findings

01

Finite-memory controllers for qualitative constraints are EXPTIME-complete.

02

Infinite-memory controllers for qualitative constraints are undecidable.

03

Finite-memory controllers for quantitative constraints are undecidable.

Abstract

We consider partially observable Markov decision processes (POMDPs) with limit-average payoff, where a reward value in the interval [0,1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) quantitative constraint defines the set of paths where the payoff is at least a given threshold lambda_1 in (0,1]; and (ii) qualitative constraint which is a special case of quantitative constraint with lambda_1=1. We consider the computation of the almost-sure winning set, where the controller needs to ensure that the path constraint is satisfied with probability 1. Our main results for qualitative path constraint are as follows: (i) the problem of deciding the existence of a finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding the existence of an infinite-memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.