Decentralized Restless Bandit with Multiple Players and Unknown Dynamics
Haoyang Liu, Keqin Liu, Qing Zhao

TL;DR
This paper develops a decentralized policy for restless multi-armed bandit problems with multiple players and unknown dynamics, achieving near-logarithmic regret without prior system knowledge, applicable in various fields.
Contribution
It introduces a novel decentralized policy that handles unknown Markovian reward dynamics and collisions, achieving near-logarithmic regret in complex multi-player settings.
Findings
Achieves logarithmic regret with known system bounds.
Extends to near-logarithmic regret without prior knowledge.
Applicable in communication, finance, and industrial systems.
Abstract
We consider decentralized restless multi-armed bandit problems with unknown dynamics and multiple players. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. Players activating the same arm at the same time collide and suffer from reward loss. The objective is to maximize the long-term reward by designing a decentralized arm selection policy to address unknown reward models and collisions among players. A decentralized policy is constructed that achieves a regret with logarithmic order when an arbitrary nontrivial bound on certain system parameters is known. When no knowledge about the system is available, we extend the policy to achieve a regret arbitrarily close to the logarithmic order. The result finds applications in communication networks, financial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Smart Grid Energy Management
