Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization
Yunfan Zhao, Nikhil Behari, Edward Hughes, Edwin Zhang, Dheeraj, Nagaraj, Karl Tuyls, Aparna Taneja, Milind Tambe

TL;DR
This paper introduces PreFeRMAB, a neural network-based pre-trained model for restless multi-arm bandits that generalizes zero-shot to unseen problems, handles continuous states, and improves sample efficiency for fine-tuning.
Contribution
The paper presents a novel pre-trained neural network model for RMABs that generalizes across problem instances, supports multi-action and continuous states, and includes a new update rule with convergence guarantees.
Findings
PreFeRMAB achieves zero-shot generalization to unseen RMABs.
The model improves sample efficiency in fine-tuning on specific instances.
Empirical results demonstrate advantages on real-world inspired problems.
Abstract
Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by developing a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management
