Restless and Uncertain: Robust Policies for Restless Bandits via Deep   Multi-Agent Reinforcement Learning

Jackson A. Killian; Lily Xu; Arpita Biswas; Milind Tambe

arXiv:2107.01689·cs.LG·June 23, 2022

Restless and Uncertain: Robust Policies for Restless Bandits via Deep Multi-Agent Reinforcement Learning

Jackson A. Killian, Lily Xu, Arpita Biswas, Milind Tambe

PDF

Open Access

TL;DR

This paper develops a deep reinforcement learning framework to create robust policies for restless multi-armed bandits under uncertainty, addressing the challenge of unknown dynamics with a minimax regret approach.

Contribution

It introduces DDLPO, a novel deep RL algorithm for robust RMAB policies, and extends it to a multi-agent setting to handle adversarial uncertainty.

Findings

01

DDLPO effectively reduces sample complexity.

02

The multi-agent extension achieves robust policies in experiments.

03

The approach guarantees convergence to minimax regret policies.

Abstract

We introduce robustness in \textit{restless multi-armed bandits} (RMABs), a popular model for constrained resource allocation among independent stochastic processes (arms). Nearly all RMAB techniques assume stochastic dynamics are precisely known. However, in many real-world settings, dynamics are estimated with significant \emph{uncertainty}, e.g., via historical data, which can lead to bad outcomes if ignored. To address this, we develop an algorithm to compute minimax regret -- robust policies for RMABs. Our approach uses a double oracle framework (oracles for \textit{agent} and \textit{nature}), which is often used for single-process robust planning but requires significant new techniques to accommodate the combinatorial nature of RMABs. Specifically, we design a deep reinforcement learning (RL) algorithm, DDLPO, which tackles the combinatorial challenge by learning an auxiliary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)