Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits
Guojun Xiong, Jian Li, Rahul Singh

TL;DR
This paper introduces a new index policy for finite-horizon restless multi-armed bandits with multiple actions, providing asymptotic optimality and a learning algorithm that outperforms existing methods in regret and efficiency.
Contribution
It proposes the Occupancy-Measured-Reward Index Policy for complex bandits and develops the R(MA)^2B-UCB algorithm for unknown parameters, with proven asymptotic optimality and improved performance.
Findings
The policy is well-defined even if MDPs are not indexable.
The learning algorithm achieves sub-linear regret.
Experimental results outperform existing algorithms in regret and runtime.
Abstract
We study a finite-horizon restless multi-armed bandit problem with multiple actions, dubbed R(MA)^2B. The state of each arm evolves according to a controlled Markov decision process (MDP), and the reward of pulling an arm depends on both the current state of the corresponding MDP and the action taken. The goal is to sequentially choose actions for arms so as to maximize the expected value of the cumulative rewards collected. Since finding the optimal policy is typically intractable, we propose a computationally appealing index policy which we call Occupancy-Measured-Reward Index Policy. Our policy is well-defined even if the underlying MDPs are not indexable. We prove that it is asymptotically optimal when the activation budget and number of arms are scaled up, while keeping their ratio as a constant. For the case when the system parameters are unknown, we develop a learning algorithm.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
