An Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits
Weici Hu, Peter Frazier

TL;DR
This paper introduces an index policy for finite-horizon restless bandits that is proven to be asymptotically optimal as the number of arms grows, and demonstrates superior performance through simulations.
Contribution
It develops a novel index-based policy for finite-horizon RMABs using Lagrangian relaxation, with proven asymptotic optimality and improved simulation results.
Findings
Policy is asymptotically optimal as arms increase
Outperforms existing heuristics in simulations
Effective for finite-horizon RMAB problems
Abstract
We consider restless multi-armed bandit (RMAB) with a finite horizon and multiple pulls per period. Leveraging the Lagrangian relaxation, we approximate the problem with a collection of single arm problems. We then propose an index-based policy that uses optimal solutions of the single arm problems to index individual arms, and offer a proof that it is asymptotically optimal as the number of arms tends to infinity. We also use simulation to show that this index-based policy performs better than the state-of-art heuristics in various problem settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Smart Grid Energy Management
