Optimal Sensing via Multi-armed Bandit Relaxations in Mixed   Observability Domains

Mikko Lauri; Risto Ritala

arXiv:1603.04586·cs.AI·March 16, 2016

Optimal Sensing via Multi-armed Bandit Relaxations in Mixed Observability Domains

Mikko Lauri, Risto Ritala

PDF

TL;DR

This paper develops a method to improve decision-making in partially observable environments by relaxing constraints to identify bandit problems, enabling efficient pruning and solution quality improvement through simulations.

Contribution

It introduces a relaxation technique that transforms complex problems into multi-armed bandits, providing a new way to efficiently approximate solutions in mixed observability domains.

Findings

01

Effective pruning of search space demonstrated in simulations

02

Upper bounds improve decision-making efficiency

03

Conditions identified for bandit problem equivalence

Abstract

Sequential decision making under uncertainty is studied in a mixed observability domain. The goal is to maximize the amount of information obtained on a partially observable stochastic process under constraints imposed by a fully observable internal state. An upper bound for the optimal value function is derived by relaxing constraints. We identify conditions under which the relaxed problem is a multi-armed bandit whose optimal policy is easily computable. The upper bound is applied to prune the search space in the original problem, and the effect on solution quality is assessed via simulation experiments. Empirical results show effective pruning of the search space in a target monitoring domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.