Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach
Dimitris Bertsimas, Cheol Woo Kim, Jos\'e Ni\~no-Mora

TL;DR
This paper introduces a machine learning framework for optimal control of fluid restless multi-armed bandit problems, achieving high-quality policies with significant computational speed-ups.
Contribution
It develops a novel numerical algorithm and a learning approach using OCT-H for controlling FRMABPs with affine or quadratic dynamics.
Findings
High-quality policies for FRMABPs demonstrated in various applications.
Achieved up to 26 million times speed-up over direct algorithms.
Effective training set enhancement via nonlinear feature transformation.
Abstract
We present a novel machine learning framework for the optimal control of fluid restless multi-armed bandit problems (FRMABPs) with state equations that are either affine or quadratic in the state variables. By establishing fundamental properties of FRMABPs, we develop an efficient numerical algorithm that generates a comprehensive training set by solving multiple instances with diverse initial states. We further enhance this training set by applying a nonlinear transformation to the feature vectors, leveraging structural properties of FRMABPs. A time-dependent state feedback policy is then learned using Optimal Classification Trees with hyperplane splits (OCT-H). We test our approach on machine maintenance, epidemic control, and fisheries control problems, demonstrating that our method yields high-quality state feedback policies. Furthermore, once a policy is learned, it achieves a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
