An Optimal-Control Approach to Infinite-Horizon Restless Bandits:   Achieving Asymptotic Optimality with Minimal Assumptions

Chen YAN

arXiv:2403.11913·math.OC·March 19, 2024·CDC·1 cites

An Optimal-Control Approach to Infinite-Horizon Restless Bandits: Achieving Asymptotic Optimality with Minimal Assumptions

Chen YAN

PDF

Open Access

TL;DR

This paper introduces an optimal-control approach for infinite-horizon restless bandits, demonstrating asymptotic optimality under minimal assumptions and using a novel 'align and steer' strategy with model predictive control.

Contribution

It relaxes previous assumptions by focusing on the reachability of a stationary state, enabling asymptotic optimality without the unichain condition, and proposes a new control strategy.

Findings

01

Model predictive control outperforms existing policies in numerical tests.

02

Reachability of a stationary state suffices for asymptotic optimality.

03

Minimal assumptions needed for policy optimality.

Abstract

We adopt an optimal-control framework for addressing the undiscounted infinite-horizon discrete-time restless $N$ -armed bandit problem. Unlike most studies that rely on constructing policies based on the relaxed single-armed Markov Decision Process (MDP), we propose relaxing the entire bandit MDP as an optimal-control problem through the certainty equivalence control principle. Our main contribution is demonstrating that the reachability of an optimal stationary state within the optimal-control problem is a sufficient condition for the existence of an asymptotically optimal policy. Such a policy can be devised using an "align and steer" strategy. This reachability assumption is less stringent than any prior assumptions imposed on the arm-level MDP, notably the unichain condition is no longer needed. Through numerical examples, we show that employing model predictive control for steering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Cognitive Radio Networks and Spectrum Sensing