Value-Directed Sampling Methods for POMDPs

Pascal Poupart; Luis E. Ortiz; Craig Boutilier

arXiv:1301.2305·cs.AI·January 14, 2013·5 cites

Value-Directed Sampling Methods for POMDPs

Pascal Poupart, Luis E. Ortiz, Craig Boutilier

PDF

Open Access

TL;DR

This paper introduces value-directed sampling methods for POMDPs, providing error bounds and an adaptive sampling procedure to improve decision-making accuracy with particle filtering.

Contribution

It develops a novel approach that dynamically allocates sampling effort in particle filtering for POMDPs based on value considerations, with theoretical error bounds and empirical validation.

Findings

01

Error bounds on decision quality with importance sampling

02

An adaptive sampling procedure reduces unnecessary computation

03

Empirical results show improved policy differentiation

Abstract

We consider the problem of approximate belief-state monitoring using particle filtering for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP). While particle filtering has become a widely-used tool in AI for monitoring dynamical systems, rather scant attention has been paid to their use in the context of decision making. Assuming the existence of a value function, we derive error bounds on decision quality associated with filtering using importance sampling. We also describe an adaptive procedure that can be used to dynamically determine the number of samples required to meet specific error bounds. Empirical evidence is offered supporting this technique as a profitable means of directing sampling effort where it is needed to distinguish policies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Distributed Sensor Networks and Detection Algorithms