Optimal Sensing via Multi-armed Bandit Relaxations in Mixed Observability Domains
Mikko Lauri, Risto Ritala

TL;DR
This paper develops a method to improve decision-making in partially observable environments by relaxing constraints to identify bandit problems, enabling efficient pruning and solution quality improvement through simulations.
Contribution
It introduces a relaxation technique that transforms complex problems into multi-armed bandits, providing a new way to efficiently approximate solutions in mixed observability domains.
Findings
Effective pruning of search space demonstrated in simulations
Upper bounds improve decision-making efficiency
Conditions identified for bandit problem equivalence
Abstract
Sequential decision making under uncertainty is studied in a mixed observability domain. The goal is to maximize the amount of information obtained on a partially observable stochastic process under constraints imposed by a fully observable internal state. An upper bound for the optimal value function is derived by relaxing constraints. We identify conditions under which the relaxed problem is a multi-armed bandit whose optimal policy is easily computable. The upper bound is applied to prune the search space in the original problem, and the effect on solution quality is assessed via simulation experiments. Empirical results show effective pruning of the search space in a target monitoring domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
