Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Kai Wang; Lily Xu; Aparna Taneja; Milind Tambe

arXiv:2205.15372·cs.LG·November 21, 2023·1 cites

Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Kai Wang, Lily Xu, Aparna Taneja, Milind Tambe

PDF

Open Access 1 Repo

TL;DR

This paper introduces UCWhittle, an online learning algorithm for restless bandits that uses an upper confidence bound approach to estimate transition dynamics and compute optimistic Whittle indices, achieving sublinear regret.

Contribution

It presents the first online learning algorithm for RMABs based on Whittle index policy with UCB, addressing unknown transition dynamics.

Findings

01

UCWhittle achieves sublinear regret of O(H√T log T).

02

It outperforms existing online learning baselines in three domains.

03

Demonstrated effectiveness on a real-world maternal and childcare dataset.

Abstract

Restless multi-armed bandits (RMABs) extend multi-armed bandits to allow for stateful arms, where the state of each arm evolves restlessly with different transitions depending on whether that arm is pulled. Solving RMABs requires information on transition dynamics, which are often unknown upfront. To plan in RMAB settings with unknown transitions, we propose the first online learning algorithm based on the Whittle index policy, using an upper confidence bound (UCB) approach to learn transition dynamics. Specifically, we estimate confidence bounds of the transition probabilities and formulate a bilinear program to compute optimistic Whittle indices using these estimates. Our algorithm, UCWhittle, achieves sublinear $O (H T lo g T)$ frequentist regret to solve RMABs with unknown transitions in $T$ episodes with a constant horizon $H$ . Empirically, we demonstrate that UCWhittle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lily-x/online-rmab
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research