Future memories are not needed for large classes of POMDPs
Victor Cohen, Axel Parmentier

TL;DR
This paper demonstrates that optimal memoryless policies for POMDPs can be efficiently computed and used within a model predictive control framework to approximate history-dependent policies, challenging the belief that future memories are always necessary.
Contribution
It introduces an efficient MILP-based method for computing optimal memoryless policies and a model predictive control approach using these policies to approximate optimal history-dependent solutions.
Findings
Memoryless policies can be computed efficiently using MILP.
The MILP relaxation provides high-quality upper bounds.
The SMF policy performs well on benchmark instances.
Abstract
Optimal policies for partially observed Markov decision processes (POMDPs) are history-dependent: Decisions are made based on the entire history of observation. Memoryless policies, which take decisions based on the last observation only, are generally considered useless in the literature because we can construct POMDP instances for which optimal memoryless policies are arbitrarily worse than history-dependent ones. Our purpose is to challenge this belief. We show that optimal memoryless policies can be computed efficiently using mixed integer linear programming (MILP), and perform reasonably well on a wide range of instances from the literature. When strengthened with valid inequalities, the linear relaxation of this MILP provides high quality upper-bounds on the value of an optimal history dependent policy. Furthermore, when used with a finite horizon POMDP problem with memoryless…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
