LP-based policies for restless bandits: necessary and sufficient   conditions for (exponentially fast) asymptotic optimality

Nicolas Gast (POLARIS); Bruno Gaujal (POLARIS); Chen Yan (POLARIS)

arXiv:2106.10067·math.OC·December 25, 2023

LP-based policies for restless bandits: necessary and sufficient conditions for (exponentially fast) asymptotic optimality

Nicolas Gast (POLARIS), Bruno Gaujal (POLARIS), Chen Yan (POLARIS)

PDF

TL;DR

This paper develops LP-based control policies for restless bandits, establishing conditions for their asymptotic optimality and introducing policies that outperform existing heuristics, especially under model uncertainties.

Contribution

The paper provides necessary and sufficient conditions for asymptotic optimality of LP-based policies and introduces the LP-index and LP-update policies with proven convergence rates.

Findings

01

LP-index policy is asymptotically optimal with square root convergence rate.

02

LP-update policy outperforms LP-index and other heuristics in experiments.

03

LP-update policy is robust to transition matrix estimation errors.

Abstract

We provide a framework to analyse control policies for the restless Markovian bandit model, under both finite and infinite time horizon. We show that when the population of arms goes to infinity, the value of the optimal control policy converges to the solution of a linear program (LP). We provide necessary and sufficient conditions for a generic control policy to be: i) asymptotically optimal; ii) asymptotically optimal with square root convergence rate; iii) asymptotically optimal with exponential rate. We then construct the LP-index policy that is asymptotically optimal with square root convergence rate on all models, and with exponential rate if the model is non-degenerate in finite horizon, and satisfies a uniform global attractor property in infinite horizon. We next define the LP-update policy, which is essentially a repeated LP-index policy that solves a new linear program at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.