Multi-Environment POMDPs with Finite-Horizon Objectives
L\'eonard Brice, Filip Cano, Krishnendu Chatterjee, Thomas A. Henzinger, Stefanie Muroya

TL;DR
This paper studies multi-environment POMDPs with finite-horizon goals, proving their computational complexity and introducing a practical algorithm that outperforms previous methods on benchmark problems.
Contribution
It establishes PSPACE-completeness for MEPOMDPs and provides a new efficient algorithm with empirical evaluation.
Findings
PSPACE-complete complexity for MEPOMDPs with finite horizons
New algorithm significantly outperforms previous methods on benchmarks
Empirical results demonstrate practical effectiveness
Abstract
Partially Observable Markov Decision Processes (POMDPs) are systems in which one agent interacts with a stochastic environment, and receives only partial information about the current state. In a multi-environment POMDP (MEPOMDP), the initial state is unknown, and assumed to be adversarially chosen. In this work we focus on computing the optimal value and policy in MEPOMDPs with finite-horizon objectives. That problem is known to be PSPACE-complete in POMDPs. Our main results are as follows: (1) we establish that it is also PSPACE-complete in the more general setting of MEPOMDPs; (2) we present a practical algorithm and evaluate it on classical benchmarks, significantly outperforming the only previously known algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
