Model Predictive Control is almost Optimal for Heterogeneous Restless Multi-armed Bandits
Dheeraj Narasimha, Nicolas Gast

TL;DR
This paper demonstrates that a model predictive control approach with LP updates nearly optimally solves the infinite horizon heterogeneous restless multi-armed bandit problem, providing strong theoretical guarantees and practical efficiency.
Contribution
It extends the LP-update policy to heterogeneous RMABs, proving an $O(rac{ ext{log} N}{ oot{N}})$ optimality gap and connecting it to LP-index policies.
Findings
LP-update policy achieves near-optimality in heterogeneous RMABs.
The policy is computationally efficient with small planning horizon $ au=5$.
Theoretical guarantees generalize to weakly coupled Markov Decision Processes.
Abstract
We consider a general infinite horizon Heterogeneous Restless multi-armed Bandit (RMAB). Heterogeneity is a fundamental problem for many real-world systems largely because it resists many concentration arguments. In this paper, we assume that each of the arms can have different model parameters. We show that, under a mild assumption of uniform ergodicity, a natural finite-horizon LP-update policy with randomized rounding, that was originally proposed for the homogeneous case, achieves an optimality gap in infinite time average reward problems for fully heterogeneous RMABs. In doing so, we show results that provide strong theoretical guarantees on a well-known algorithm that works very well in practice. The LP-update policy is a model predictive approach that computes a decision at time by planing over a time-horizon . Our simulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Optimization and Search Problems
