Approximation Algorithms for Restless Bandit Problems
Sudipto Guha, Kamesh Munagala, Peng Shi

TL;DR
This paper introduces a novel approximation algorithm for a special case of the restless bandit problem, achieving a 2+epsilon approximation, and extends the approach to broader subclasses and constraints, advancing the field of decision-making under uncertainty.
Contribution
It presents the first efficient O(1) approximation algorithms for Feedback MAB and Monotone bandits, a significant step forward in restless bandit problem research.
Findings
Developed a 2+epsilon-approximate greedy policy for Feedback MAB.
Defined Monotone bandits with a 2-approximation policy.
Extended the technique to handle side-constraints like blocking and switching costs.
Abstract
The restless bandit problem is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any non-trivial factor, and little progress has been made despite its importance in modeling activity allocation under uncertainty. We consider a special case that we call Feedback MAB, where the reward obtained by playing each of n independent arms varies according to an underlying on/off Markov process whose exact state is only revealed when the arm is played. The goal is to design a policy for playing the arms in order to maximize the infinite horizon time average expected reward. This problem is also an instance of a Partially Observable Markov Decision Process (POMDP), and is widely studied in wireless scheduling and unmanned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Optimization and Search Problems
