Neural Index Policies for Restless Multi-Action Bandits with Heterogeneous Budgets
Himadri S. Pandey, Kai Wang, Gian-Gabriel P. Garcia

TL;DR
This paper introduces a neural network-based index policy for complex restless multi-armed bandit problems with multiple actions and heterogeneous budgets, enabling scalable, near-optimal decision-making.
Contribution
It proposes a novel neural index policy framework that unifies index prediction and constrained optimization for multi-action RMABs with heterogeneous budgets.
Findings
Achieves within 5% of oracle performance
Strictly enforces heterogeneous budget constraints
Scales to hundreds of arms efficiently
Abstract
Restless multi-armed bandits (RMABs) provide a scalable framework for sequential decision-making under uncertainty, but classical formulations assume binary actions and a single global budget. Real-world settings, such as healthcare, often involve multiple interventions with heterogeneous costs and constraints, where such assumptions break down. We introduce a Neural Index Policy (NIP) for multi-action RMABs with heterogeneous budget constraints. Our approach learns to assign budget-aware indices to arm--action pairs using a neural network, and converts them into feasible allocations via a differentiable knapsack layer formulated as an entropy-regularized optimal transport (OT) problem. The resulting model unifies index prediction and constrained optimization in a single end-to-end differentiable framework, enabling gradient-based training directly on decision quality. The network is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
