Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling
Nida Zamir, I-Hong Hou

TL;DR
This paper introduces a Deep Index Policy (DIP) for multi-resource restless matching bandits, enabling efficient resource allocation in complex multi-channel systems through online learning and policy gradients.
Contribution
The paper presents a novel Deep Index Policy (DIP) algorithm that learns partial indexes for multi-resource restless bandits using policy gradients, applicable beyond wireless systems.
Findings
DIP efficiently learns partial indexes in simulations.
DIP outperforms baseline methods in resource allocation tasks.
The approach generalizes to various multi-resource applications.
Abstract
Scheduling in multi-channel wireless communication system presents formidable challenges in effectively allocating resources. To address these challenges, we investigate a multi-resource restless matching bandit (MR-RMB) model for heterogeneous resource systems with an objective of maximizing long-term discounted total rewards while respecting resource constraints. We have also generalized to applications beyond multi-channel wireless. We discuss the Max-Weight Index Matching algorithm, which optimizes resource allocation based on learned partial indexes. We have derived the policy gradient theorem for index learning. Our main contribution is the introduction of a new Deep Index Policy (DIP), an online learning algorithm tailored for MR-RMB. DIP learns the partial index by leveraging the policy gradient theorem for restless arms with convoluted and unknown transition kernels of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Wireless Network Optimization · Optimization and Search Problems
