A Sensing Policy Based on Confidence Bounds and a Restless Multi-Armed Bandit Model
Jan Oksanen, Visa Koivunen, H. Vincent Poor

TL;DR
This paper introduces a novel sensing policy for the restless multi-armed bandit problem in cognitive radios, combining confidence bounds with an index policy to achieve logarithmic regret and outperform existing methods.
Contribution
The work proposes a centrally coordinated index policy using confidence bounds that ensures logarithmic regret growth in restless bandit scenarios.
Findings
Achieves asymptotically logarithmic weak regret
Simulation results confirm superior performance over existing methods
Policy effectively balances exploration and exploitation
Abstract
A sensing policy for the restless multi-armed bandit problem with stationary but unknown reward distributions is proposed. The work is presented in the context of cognitive radios in which the bandit problem arises when deciding which parts of the spectrum to sense and exploit. It is shown that the proposed policy attains asymptotically logarithmic weak regret rate when the rewards are bounded independent and identically distributed or finite state Markovian. Simulation results verifying uniformly logarithmic weak regret are also presented. The proposed policy is a centrally coordinated index policy, in which the index of a frequency band is comprised of a sample mean term and a confidence term. The sample mean term promotes spectrum exploitation whereas the confidence term encourages exploration. The confidence term is designed such that the time interval between consecutive sensing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Smart Grid Energy Management
