Multi-Environment POMDPs with Finite-Horizon Objectives

L\'eonard Brice; Filip Cano; Krishnendu Chatterjee; Thomas A. Henzinger; Stefanie Muroya

arXiv:2605.07537·cs.AI·May 11, 2026

Multi-Environment POMDPs with Finite-Horizon Objectives

L\'eonard Brice, Filip Cano, Krishnendu Chatterjee, Thomas A. Henzinger, Stefanie Muroya

PDF

TL;DR

This paper studies multi-environment POMDPs with finite-horizon goals, proving their computational complexity and introducing a practical algorithm that outperforms previous methods on benchmark problems.

Contribution

It establishes PSPACE-completeness for MEPOMDPs and provides a new efficient algorithm with empirical evaluation.

Findings

01

PSPACE-complete complexity for MEPOMDPs with finite horizons

02

New algorithm significantly outperforms previous methods on benchmarks

03

Empirical results demonstrate practical effectiveness

Abstract

Partially Observable Markov Decision Processes (POMDPs) are systems in which one agent interacts with a stochastic environment, and receives only partial information about the current state. In a multi-environment POMDP (MEPOMDP), the initial state is unknown, and assumed to be adversarially chosen. In this work we focus on computing the optimal value and policy in MEPOMDPs with finite-horizon objectives. That problem is known to be PSPACE-complete in POMDPs. Our main results are as follows: (1) we establish that it is also PSPACE-complete in the more general setting of MEPOMDPs; (2) we present a practical algorithm and evaluate it on classical benchmarks, significantly outperforming the only previously known algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.