Future memories are not needed for large classes of POMDPs

Victor Cohen; Axel Parmentier

arXiv:2205.02580·math.OC·May 6, 2022·Oper. Res. Lett.

Future memories are not needed for large classes of POMDPs

Victor Cohen, Axel Parmentier

PDF

TL;DR

This paper demonstrates that optimal memoryless policies for POMDPs can be efficiently computed and used within a model predictive control framework to approximate history-dependent policies, challenging the belief that future memories are always necessary.

Contribution

It introduces an efficient MILP-based method for computing optimal memoryless policies and a model predictive control approach using these policies to approximate optimal history-dependent solutions.

Findings

01

Memoryless policies can be computed efficiently using MILP.

02

The MILP relaxation provides high-quality upper bounds.

03

The SMF policy performs well on benchmark instances.

Abstract

Optimal policies for partially observed Markov decision processes (POMDPs) are history-dependent: Decisions are made based on the entire history of observation. Memoryless policies, which take decisions based on the last observation only, are generally considered useless in the literature because we can construct POMDP instances for which optimal memoryless policies are arbitrarily worse than history-dependent ones. Our purpose is to challenge this belief. We show that optimal memoryless policies can be computed efficiently using mixed integer linear programming (MILP), and perform reasonably well on a wide range of instances from the literature. When strengthened with valid inequalities, the linear relaxation of this MILP provides high quality upper-bounds on the value of an optimal history dependent policy. Furthermore, when used with a finite horizon POMDP problem with memoryless…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.