Towards a Pretrained Model for Restless Bandits via Multi-arm   Generalization

Yunfan Zhao; Nikhil Behari; Edward Hughes; Edwin Zhang; Dheeraj; Nagaraj; Karl Tuyls; Aparna Taneja; Milind Tambe

arXiv:2310.14526·cs.LG·January 31, 2024·1 cites

Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization

Yunfan Zhao, Nikhil Behari, Edward Hughes, Edwin Zhang, Dheeraj, Nagaraj, Karl Tuyls, Aparna Taneja, Milind Tambe

PDF

Open Access

TL;DR

This paper introduces PreFeRMAB, a neural network-based pre-trained model for restless multi-arm bandits that generalizes zero-shot to unseen problems, handles continuous states, and improves sample efficiency for fine-tuning.

Contribution

The paper presents a novel pre-trained neural network model for RMABs that generalizes across problem instances, supports multi-action and continuous states, and includes a new update rule with convergence guarantees.

Findings

01

PreFeRMAB achieves zero-shot generalization to unseen RMABs.

02

The model improves sample efficiency in fine-tuning on specific instances.

03

Empirical results demonstrate advantages on real-world inspired problems.

Abstract

Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by developing a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management