Value-Function Approximations for Partially Observable Markov Decision   Processes

M. Hauskrecht

arXiv:1106.0234·cs.AI·June 2, 2011

Value-Function Approximations for Partially Observable Markov Decision Processes

M. Hauskrecht

PDF

TL;DR

This paper reviews and develops approximation methods for solving POMDPs, balancing computational efficiency and accuracy, supported by experiments in agent navigation scenarios.

Contribution

It provides a comprehensive survey of existing approximation techniques and introduces new methods and refinements for POMDPs.

Findings

01

New approximation methods proposed.

02

Analysis of properties and relations of existing methods.

03

Experimental validation on agent navigation problem.

Abstract

Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price -- exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.