Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning

David Hud\'ak; Maris F. L. Galesloot; Martin Tappler; Martin Kure\v{c}ka; Nils Jansen; Milan \v{C}e\v{s}ka

arXiv:2602.08734·cs.AI·April 2, 2026

Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning

David Hud\'ak, Maris F. L. Galesloot, Martin Tappler, Martin Kure\v{c}ka, Nils Jansen, Milan \v{C}e\v{s}ka

PDF

TL;DR

This paper introduces Lexpop, a framework combining deep reinforcement learning and finite-state controllers to solve large-scale POMDPs and HM-POMDPs with performance guarantees.

Contribution

Lexpop is the first method to extract formally evaluable finite-state controllers from neural policies for POMDPs and extend this to robustly handle multiple models.

Findings

01

Lexpop outperforms existing solvers on large state space problems.

02

Finite-state controllers enable formal performance evaluation.

03

Robust policies effectively handle sets of POMDPs.

Abstract

Solving partially observable Markov decision processes (POMDPs) requires computing policies under imperfect state information. Despite recent advances, the scalability of existing POMDP solvers remains limited. Moreover, many settings require a policy that is robust across multiple POMDPs, further aggravating the scalability issue. We propose the Lexpop framework for POMDP solving. Lexpop (1) employs deep reinforcement learning to train a neural policy, represented by a recurrent neural network, and (2) constructs a finite-state controller mimicking the neural policy through efficient extraction methods. Crucially, unlike neural policies, such controllers can be formally evaluated, providing performance guarantees. We extend Lexpop to compute robust policies for hidden-model POMDPs (HM-POMDPs), which describe finite sets of POMDPs. We associate every extracted controller with its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.