Networked Restless Multi-Arm Bandits with Reinforcement Learning

Hanmo Zhang; Zenghui Sun; Kai Wang

arXiv:2512.06274·cs.LG·December 9, 2025

Networked Restless Multi-Arm Bandits with Reinforcement Learning

Hanmo Zhang, Zenghui Sun, Kai Wang

PDF

Open Access

TL;DR

This paper introduces Networked RMAB, integrating RMAB with network interactions, and develops an efficient Q-learning algorithm that outperforms existing methods on real-world data.

Contribution

It presents a novel Networked RMAB framework, establishes its theoretical properties, and develops a scalable Q-learning algorithm for networked environments.

Findings

01

Q-learning outperforms $k$-step look-ahead methods.

02

Network effects significantly improve decision-making.

03

Theoretical guarantees ensure convergence of the approximation.

Abstract

Restless Multi-Armed Bandits (RMABs) are a powerful framework for sequential decision-making, widely applied in resource allocation and intervention optimization challenges in public health. However, traditional RMABs assume independence among arms, limiting their ability to account for interactions between individuals that can be common and significant in a real-world environment. This paper introduces Networked RMAB, a novel framework that integrates the RMAB model with the independent cascade model to capture interactions between arms in networked environments. We define the Bellman equation for networked RMAB and present its computational challenge due to exponentially large action and state spaces. To resolve the computational challenge, we establish the submodularity of Bellman equation and apply the hill-climbing algorithm to achieve a $1 - \frac{1}{e}$ approximation guarantee in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Stochastic Gradient Optimization Techniques