Multi-armed Bandits with Constrained Arms and Hidden States
Varun Mehta, Rahul Meshram, Kesav Kaza, S. N. Merchant

TL;DR
This paper studies multi-armed bandit problems with hidden states and availability constraints, establishing structural properties, deriving index formulas, and comparing policies through numerical analysis.
Contribution
It introduces structural results for value functions, proves indexability, and derives index formulas for constrained multi-armed bandits with hidden states.
Findings
Optimal policy is a threshold policy.
Indexability of rested bandits is established.
Numerical examples compare index and myopic policies.
Abstract
The problem of rested and restless multi-armed bandits with constrained availability of arms is considered. The states of arms evolve in Markovian manner and the exact states are hidden from the decision maker. First, some structural results on value functions are claimed. Following these results, the optimal policy turns out to be a \textit{threshold policy}. Further, \textit{indexability} of rested bandits is established and index formula is derived. The performance of index policy is illustrated and compared with myopic policy using numerical examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Smart Grid Energy Management
