Minimizing Cost Rather Than Maximizing Reward in Restless Multi-Armed Bandits
R. Teal Witter, Lisa Hellerstein

TL;DR
This paper introduces a new minimization framework for Restless Multi-Armed Bandits focusing on reward thresholds and cost minimization, extending the Whittle index approach and analyzing heuristic solutions.
Contribution
It formulates a constrained minimization problem for RMABs, develops a Whittle index for this setting, and compares heuristics for solving the minimization problem.
Findings
Whittle index for minimization can be derived from maximization index.
Minimization problem is PSPACE-hard even approximately.
Heuristics perform variably, with some achieving optimal solutions and others failing.
Abstract
Restless Multi-Armed Bandits (RMABs) offer a powerful framework for solving resource constrained maximization problems. However, the formulation can be inappropriate for settings where the limiting constraint is a reward threshold rather than a budget. We introduce a constrained minimization problem for RMABs that balances the goal of achieving a reward threshold while minimizing total cost. We show that even a bi-criteria approximate version of the problem is PSPACE-hard. Motivated by the hardness result, we define a decoupled problem, indexability and a Whittle index for the minimization problem, mirroring the corresponding concepts for the maximization problem. Further, we show that the Whittle index for the minimization problem can easily be computed from the Whittle index for the maximization problem. Consequently, Whittle index results on RMAB instances for the maximization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Mind wandering and attention
