Solving POMDPs by Searching in Policy Space
Eric A. Hansen

TL;DR
This paper introduces a novel approach to solving POMDPs by explicitly searching in policy space using finite-state controllers, outperforming traditional value iteration methods and focusing computational effort on reachable regions.
Contribution
It presents a new policy iteration algorithm and a heuristic search method that improve efficiency in solving infinite-horizon POMDPs.
Findings
Policy iteration can outperform value iteration in POMDPs.
Heuristic search focuses computation on reachable states.
New algorithms improve speed and efficiency.
Abstract
Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a finite-state controller and iteratively improves the controller by search in policy space. Two related algorithms illustrate this approach. The first is a policy iteration algorithm that can outperform value iteration in solving infinitehorizon POMDPs. It provides the foundation for a new heuristic search algorithm that promises further speedup by focusing computational effort on regions of the problem space that are reachable, or likely to be reached, from a start state.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Advanced Control Systems Optimization
