Solving POMDPs by Searching in Policy Space

Eric A. Hansen

arXiv:1301.7380·cs.AI·February 1, 2013·191 cites

Solving POMDPs by Searching in Policy Space

Eric A. Hansen

PDF

Open Access

TL;DR

This paper introduces a novel approach to solving POMDPs by explicitly searching in policy space using finite-state controllers, outperforming traditional value iteration methods and focusing computational effort on reachable regions.

Contribution

It presents a new policy iteration algorithm and a heuristic search method that improve efficiency in solving infinite-horizon POMDPs.

Findings

01

Policy iteration can outperform value iteration in POMDPs.

02

Heuristic search focuses computation on reachable states.

03

New algorithms improve speed and efficiency.

Abstract

Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a finite-state controller and iteratively improves the controller by search in policy space. Two related algorithms illustrate this approach. The first is a policy iteration algorithm that can outperform value iteration in solving infinitehorizon POMDPs. It provides the foundation for a new heuristic search algorithm that promises further speedup by focusing computational effort on regions of the problem space that are reachable, or likely to be reached, from a start state.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Advanced Control Systems Optimization