Multi-armed Bandits with Constrained Arms and Hidden States

Varun Mehta; Rahul Meshram; Kesav Kaza; S. N. Merchant

arXiv:1710.07115·cs.SY·October 20, 2017

Multi-armed Bandits with Constrained Arms and Hidden States

Varun Mehta, Rahul Meshram, Kesav Kaza, S. N. Merchant

PDF

Open Access

TL;DR

This paper studies multi-armed bandit problems with hidden states and availability constraints, establishing structural properties, deriving index formulas, and comparing policies through numerical analysis.

Contribution

It introduces structural results for value functions, proves indexability, and derives index formulas for constrained multi-armed bandits with hidden states.

Findings

01

Optimal policy is a threshold policy.

02

Indexability of rested bandits is established.

03

Numerical examples compare index and myopic policies.

Abstract

The problem of rested and restless multi-armed bandits with constrained availability of arms is considered. The states of arms evolve in Markovian manner and the exact states are hidden from the decision maker. First, some structural results on value functions are claimed. Following these results, the optimal policy turns out to be a \textit{threshold policy}. Further, \textit{indexability} of rested bandits is established and index formula is derived. The performance of index policy is illustrated and compared with myopic policy using numerical examples.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Smart Grid Energy Management