An Asymptotically Optimal Index Policy for Finite-Horizon Restless   Bandits

Weici Hu; Peter Frazier

arXiv:1707.00205·math.OC·July 4, 2017·5 cites

An Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits

Weici Hu, Peter Frazier

PDF

Open Access

TL;DR

This paper introduces an index policy for finite-horizon restless bandits that is proven to be asymptotically optimal as the number of arms grows, and demonstrates superior performance through simulations.

Contribution

It develops a novel index-based policy for finite-horizon RMABs using Lagrangian relaxation, with proven asymptotic optimality and improved simulation results.

Findings

01

Policy is asymptotically optimal as arms increase

02

Outperforms existing heuristics in simulations

03

Effective for finite-horizon RMAB problems

Abstract

We consider restless multi-armed bandit (RMAB) with a finite horizon and multiple pulls per period. Leveraging the Lagrangian relaxation, we approximate the problem with a collection of single arm problems. We then propose an index-based policy that uses optimal solutions of the single arm problems to index individual arms, and offer a proof that it is asymptotically optimal as the number of arms tends to infinity. We also use simulation to show that this index-based policy performs better than the state-of-art heuristics in various problem settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Smart Grid Energy Management