Risk-Aware Decision Making in Restless Bandits: Theory and Algorithms for Planning and Learning

Nima Akbarzadeh; Yossiri Adulyasak; Erick Delage

arXiv:2410.23029·cs.LG·February 20, 2026

Risk-Aware Decision Making in Restless Bandits: Theory and Algorithms for Planning and Learning

Nima Akbarzadeh, Yossiri Adulyasak, Erick Delage

PDF

Open Access

TL;DR

This paper extends restless bandits to include risk-awareness, providing new theoretical conditions, a Whittle index solution for planning, and a Thompson sampling approach for learning, with demonstrated risk reduction in practical scenarios.

Contribution

It introduces risk-aware objectives into restless bandits, establishes indexability conditions, and develops planning and learning algorithms with theoretical guarantees.

Findings

01

Proposed a risk-aware Whittle index for restless bandits.

02

Developed a Thompson sampling algorithm with sublinear regret.

03

Numerical experiments show effective risk mitigation in applications.

Abstract

In restless bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with each arm being a Markov decision process. In this work, we generalize the traditional restless bandits problem with a risk-neutral objective by incorporating risk-awareness, which is particularly important in various real-world applications especially when the decision maker seeks to mitigate downside risks. We establish indexability conditions for the case of a risk-aware objective and provide a solution based on Whittle index for the first time for the planning problem with finite-horizon non-stationary and for infinite-horizon stationary Markov decision processes. In addition, we address the learning problem when the true transition probabilities are unknown by proposing a Thompson sampling approach and show that it achieves bounded regret that scales…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics

MethodsSparse Evolutionary Training