Dynamic Spectrum Access using Stochastic Multi-User Bandits
Meghana Bande, Akshayaa Magesh, Venugopal V. Veeravalli

TL;DR
This paper introduces a novel stochastic multi-user bandit algorithm for uncoordinated spectrum access, capable of handling more users than channels and dynamic user populations, with proven order-optimal regret bounds.
Contribution
It develops a new algorithm that accounts for non-zero rewards under collisions and adapts to changing user numbers, advancing spectrum access strategies.
Findings
Order-optimal $O( ext{log } T)$ regret for fixed user and channel counts
Sub-linear regret in dynamic user scenarios
Algorithm performs well even with more users than channels
Abstract
A stochastic multi-user multi-armed bandit framework is used to develop algorithms for uncoordinated spectrum access. In contrast to prior work, it is assumed that rewards can be non-zero even under collisions, thus allowing for the number of users to be greater than the number of channels. The proposed algorithm consists of an estimation phase and an allocation phase. It is shown that if every user adopts the algorithm, the system wide regret is order-optimal of order over a time-horizon of duration . The regret guarantees hold for both the cases where the number of users is greater than or less than the number of channels. The algorithm is extended to the dynamic case where the number of users in the system evolves over time, and is shown to lead to sub-linear regret.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
