Networked Restless Multi-Arm Bandits with Reinforcement Learning
Hanmo Zhang, Zenghui Sun, Kai Wang

TL;DR
This paper introduces Networked RMAB, integrating RMAB with network interactions, and develops an efficient Q-learning algorithm that outperforms existing methods on real-world data.
Contribution
It presents a novel Networked RMAB framework, establishes its theoretical properties, and develops a scalable Q-learning algorithm for networked environments.
Findings
Q-learning outperforms $k$-step look-ahead methods.
Network effects significantly improve decision-making.
Theoretical guarantees ensure convergence of the approximation.
Abstract
Restless Multi-Armed Bandits (RMABs) are a powerful framework for sequential decision-making, widely applied in resource allocation and intervention optimization challenges in public health. However, traditional RMABs assume independence among arms, limiting their ability to account for interactions between individuals that can be common and significant in a real-world environment. This paper introduces Networked RMAB, a novel framework that integrates the RMAB model with the independent cascade model to capture interactions between arms in networked environments. We define the Bellman equation for networked RMAB and present its computational challenge due to exponentially large action and state spaces. To resolve the computational challenge, we establish the submodularity of Bellman equation and apply the hill-climbing algorithm to achieve a approximation guarantee in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Stochastic Gradient Optimization Techniques
