Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach

Dimitris Bertsimas; Cheol Woo Kim; Jos\'e Ni\~no-Mora

arXiv:2502.03725·cs.LG·May 8, 2026

Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach

Dimitris Bertsimas, Cheol Woo Kim, Jos\'e Ni\~no-Mora

PDF

TL;DR

This paper introduces a machine learning framework for optimal control of fluid restless multi-armed bandit problems, achieving high-quality policies with significant computational speed-ups.

Contribution

It develops a novel numerical algorithm and a learning approach using OCT-H for controlling FRMABPs with affine or quadratic dynamics.

Findings

01

High-quality policies for FRMABPs demonstrated in various applications.

02

Achieved up to 26 million times speed-up over direct algorithms.

03

Effective training set enhancement via nonlinear feature transformation.

Abstract

We present a novel machine learning framework for the optimal control of fluid restless multi-armed bandit problems (FRMABPs) with state equations that are either affine or quadratic in the state variables. By establishing fundamental properties of FRMABPs, we develop an efficient numerical algorithm that generates a comprehensive training set by solving multiple instances with diverse initial states. We further enhance this training set by applying a nonlinear transformation to the feature vectors, leveraging structural properties of FRMABPs. A time-dependent state feedback policy is then learned using Optimal Classification Trees with hyperplane splits (OCT-H). We test our approach on machine maintenance, epidemic control, and fisheries control problems, demonstrating that our method yields high-quality state feedback policies. Furthermore, once a policy is learned, it achieves a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.