Achieving $\tilde{\mathcal{O}}(1/N)$ Optimality Gap in Restless Bandits through Gaussian Approximation
Chen Yan, Weina Wang, Lei Ying

TL;DR
This paper introduces a Gaussian approximation-based policy for finite-horizon Restless Multi-Armed Bandits that achieves an optimality gap of order 1/N in degenerate cases, improving upon previous bounds.
Contribution
It presents the first stochastic programming approach using Gaussian approximation to attain an O(1/N) optimality gap in degenerate RMABs, extending beyond non-degenerate cases.
Findings
Achieves O(1/N) optimality gap for degenerate RMABs.
Uses Gaussian stochastic systems to better approximate RMAB dynamics.
First to establish such optimality gap in degenerate settings.
Abstract
We study the finite-horizon Restless Multi-Armed Bandit (RMAB) problem with homogeneous arms. Prior work has shown that when an RMAB satisfies a non-degeneracy condition, Linear-Programming-based (LP-based) policies derived from the fluid approximation, which captures the mean dynamics of the system, achieve an exponentially small optimality gap. However, it is common for RMABs to be degenerate, in which case LP-based policies can result in a optimality gap per arm. In this paper, we propose a novel Stochastic-Programming-based (SP-based) policy that, under a uniqueness assumption, achieves an optimality gap for degenerate RMABs. Our approach is based on the construction of a Gaussian stochastic system that captures not only the mean but also the variance of the RMAB dynamics, resulting in a more accurate approximation than the fluid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Cognitive Radio Networks and Spectrum Sensing
MethodsDiffusion
