Deep Index Policy for Multi-Resource Restless Matching Bandit and Its   Application in Multi-Channel Scheduling

Nida Zamir; I-Hong Hou

arXiv:2408.07205·cs.LG·August 21, 2024

Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling

Nida Zamir, I-Hong Hou

PDF

Open Access

TL;DR

This paper introduces a Deep Index Policy (DIP) for multi-resource restless matching bandits, enabling efficient resource allocation in complex multi-channel systems through online learning and policy gradients.

Contribution

The paper presents a novel Deep Index Policy (DIP) algorithm that learns partial indexes for multi-resource restless bandits using policy gradients, applicable beyond wireless systems.

Findings

01

DIP efficiently learns partial indexes in simulations.

02

DIP outperforms baseline methods in resource allocation tasks.

03

The approach generalizes to various multi-resource applications.

Abstract

Scheduling in multi-channel wireless communication system presents formidable challenges in effectively allocating resources. To address these challenges, we investigate a multi-resource restless matching bandit (MR-RMB) model for heterogeneous resource systems with an objective of maximizing long-term discounted total rewards while respecting resource constraints. We have also generalized to applications beyond multi-channel wireless. We discuss the Max-Weight Index Matching algorithm, which optimizes resource allocation based on learned partial indexes. We have derived the policy gradient theorem for index learning. Our main contribution is the introduction of a new Deep Index Policy (DIP), an online learning algorithm tailored for MR-RMB. DIP learns the partial index by leveraging the policy gradient theorem for restless arms with convoluted and unknown transition kernels of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Advanced Wireless Network Optimization · Optimization and Search Problems