Model Predictive Control is almost Optimal for Heterogeneous Restless Multi-armed Bandits

Dheeraj Narasimha; Nicolas Gast

arXiv:2511.08097·math.OC·November 12, 2025

Model Predictive Control is almost Optimal for Heterogeneous Restless Multi-armed Bandits

Dheeraj Narasimha, Nicolas Gast

PDF

Open Access

TL;DR

This paper demonstrates that a model predictive control approach with LP updates nearly optimally solves the infinite horizon heterogeneous restless multi-armed bandit problem, providing strong theoretical guarantees and practical efficiency.

Contribution

It extends the LP-update policy to heterogeneous RMABs, proving an $O(rac{ ext{log} N}{ oot{N}})$ optimality gap and connecting it to LP-index policies.

Findings

01

LP-update policy achieves near-optimality in heterogeneous RMABs.

02

The policy is computationally efficient with small planning horizon $ au=5$.

03

Theoretical guarantees generalize to weakly coupled Markov Decision Processes.

Abstract

We consider a general infinite horizon Heterogeneous Restless multi-armed Bandit (RMAB). Heterogeneity is a fundamental problem for many real-world systems largely because it resists many concentration arguments. In this paper, we assume that each of the $N$ arms can have different model parameters. We show that, under a mild assumption of uniform ergodicity, a natural finite-horizon LP-update policy with randomized rounding, that was originally proposed for the homogeneous case, achieves an $O (lo g N 1/ N)$ optimality gap in infinite time average reward problems for fully heterogeneous RMABs. In doing so, we show results that provide strong theoretical guarantees on a well-known algorithm that works very well in practice. The LP-update policy is a model predictive approach that computes a decision at time $t$ by planing over a time-horizon ${t \dots t + τ}$ . Our simulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Optimization and Search Problems