Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning
David Hud\'ak, Maris F. L. Galesloot, Martin Tappler, Martin Kure\v{c}ka, Nils Jansen, Milan \v{C}e\v{s}ka

TL;DR
This paper introduces Lexpop, a framework combining deep reinforcement learning and finite-state controllers to solve large-scale POMDPs and HM-POMDPs with performance guarantees.
Contribution
Lexpop is the first method to extract formally evaluable finite-state controllers from neural policies for POMDPs and extend this to robustly handle multiple models.
Findings
Lexpop outperforms existing solvers on large state space problems.
Finite-state controllers enable formal performance evaluation.
Robust policies effectively handle sets of POMDPs.
Abstract
Solving partially observable Markov decision processes (POMDPs) requires computing policies under imperfect state information. Despite recent advances, the scalability of existing POMDP solvers remains limited. Moreover, many settings require a policy that is robust across multiple POMDPs, further aggravating the scalability issue. We propose the Lexpop framework for POMDP solving. Lexpop (1) employs deep reinforcement learning to train a neural policy, represented by a recurrent neural network, and (2) constructs a finite-state controller mimicking the neural policy through efficient extraction methods. Crucially, unlike neural policies, such controllers can be formally evaluated, providing performance guarantees. We extend Lexpop to compute robust policies for hidden-model POMDPs (HM-POMDPs), which describe finite sets of POMDPs. We associate every extracted controller with its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
