Restless and Uncertain: Robust Policies for Restless Bandits via Deep Multi-Agent Reinforcement Learning
Jackson A. Killian, Lily Xu, Arpita Biswas, Milind Tambe

TL;DR
This paper develops a deep reinforcement learning framework to create robust policies for restless multi-armed bandits under uncertainty, addressing the challenge of unknown dynamics with a minimax regret approach.
Contribution
It introduces DDLPO, a novel deep RL algorithm for robust RMAB policies, and extends it to a multi-agent setting to handle adversarial uncertainty.
Findings
DDLPO effectively reduces sample complexity.
The multi-agent extension achieves robust policies in experiments.
The approach guarantees convergence to minimax regret policies.
Abstract
We introduce robustness in \textit{restless multi-armed bandits} (RMABs), a popular model for constrained resource allocation among independent stochastic processes (arms). Nearly all RMAB techniques assume stochastic dynamics are precisely known. However, in many real-world settings, dynamics are estimated with significant \emph{uncertainty}, e.g., via historical data, which can lead to bad outcomes if ignored. To address this, we develop an algorithm to compute minimax regret -- robust policies for RMABs. Our approach uses a double oracle framework (oracles for \textit{agent} and \textit{nature}), which is often used for single-process robust planning but requires significant new techniques to accommodate the combinatorial nature of RMABs. Specifically, we design a deep reinforcement learning (RL) algorithm, DDLPO, which tackles the combinatorial challenge by learning an auxiliary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)
