Nonapproximability Results for Partially Observable Markov Decision Processes
J. Goldsmith, C. Lusena, M. Mundhenk

TL;DR
This paper demonstrates that for certain variations of partially observable Markov decision processes, polynomial-time algorithms cannot reliably find near-optimal policies unless major complexity class collapses occur, highlighting fundamental computational limitations.
Contribution
It establishes strong nonapproximability results for control policies in POMDPs, linking their computational hardness to major complexity class assumptions.
Findings
Polynomial-time algorithms cannot guarantee near-optimal solutions for some POMDP variants.
Achieving constant-factor approximations would imply unlikely complexity class collapses.
The results delineate the computational boundaries for control policy design in POMDPs.
Abstract
We show that for several variations of partially observable Markov decision processes, polynomial-time algorithms for finding control policies are unlikely to or simply don't have guarantees of finding policies within a constant factor or a constant summand of optimal. Here "unlikely" means "unless some complexity classes collapse," where the collapses considered are P=NP, P=PSPACE, or P=EXP. Until or unless these collapses are shown to hold, any control-policy designer must choose between such performance guarantees and efficient computation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
