Solving POMDPs by Searching the Space of Finite Policies

Nicolas Meuleau; Kee-Eung Kim; Leslie Pack Kaelbling; Anthony R.; Cassandra

arXiv:1301.6720·cs.AI·January 30, 2013·148 cites

Solving POMDPs by Searching the Space of Finite Policies

Nicolas Meuleau, Kee-Eung Kim, Leslie Pack Kaelbling, Anthony R., Cassandra

PDF

Open Access

TL;DR

This paper introduces methods for approximately solving POMDPs by searching within a restricted set of finite policies, significantly reducing complexity and enabling practical solutions.

Contribution

It proposes a framework for finding optimal policies within finite automata representations, with algorithms for both deterministic and stochastic policies.

Findings

01

Branch-and-bound method finds globally optimal deterministic policies

02

Gradient-ascent method finds locally optimal stochastic policies

03

Empirical results show effectiveness of the proposed approaches

Abstract

Solving partially observable Markov decision processes (POMDPs) is highly intractable in general, at least in part because the optimal policy may be infinitely large. In this paper, we explore the problem of finding the optimal policy from a restricted set of policies, represented as finite state automata of a given size. This problem is also intractable, but we show that the complexity can be greatly reduced when the POMDP and/or policy are further constrained. We demonstrate good empirical results with a branch-and-bound method for finding globally optimal deterministic policies, and a gradient-ascent method for finding locally optimal stochastic policies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Optimization and Search Problems