Solving POMDPs by Searching the Space of Finite Policies
Nicolas Meuleau, Kee-Eung Kim, Leslie Pack Kaelbling, Anthony R., Cassandra

TL;DR
This paper introduces methods for approximately solving POMDPs by searching within a restricted set of finite policies, significantly reducing complexity and enabling practical solutions.
Contribution
It proposes a framework for finding optimal policies within finite automata representations, with algorithms for both deterministic and stochastic policies.
Findings
Branch-and-bound method finds globally optimal deterministic policies
Gradient-ascent method finds locally optimal stochastic policies
Empirical results show effectiveness of the proposed approaches
Abstract
Solving partially observable Markov decision processes (POMDPs) is highly intractable in general, at least in part because the optimal policy may be infinitely large. In this paper, we explore the problem of finding the optimal policy from a restricted set of policies, represented as finite state automata of a given size. This problem is also intractable, but we show that the complexity can be greatly reduced when the POMDP and/or policy are further constrained. We demonstrate good empirical results with a branch-and-bound method for finding globally optimal deterministic policies, and a gradient-ascent method for finding locally optimal stochastic policies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Optimization and Search Problems
