Online Learning in Opportunistic Spectrum Access: A Restless Bandit Approach
Cem Tekin, Mingyan Liu

TL;DR
This paper addresses the challenge of online learning for opportunistic spectrum access where channel conditions are modeled as Markov chains, proposing an algorithm that achieves optimal logarithmic regret in a restless bandit setting.
Contribution
It introduces a novel index-based algorithm utilizing regenerative cycles for Markovian rewards in restless bandits, achieving optimal logarithmic regret.
Findings
The proposed algorithm attains logarithmic regret over time.
Regret bounds are proven to be optimal under mild conditions.
The approach extends bandit theory to Markovian, restless environments.
Abstract
We consider an opportunistic spectrum access (OSA) problem where the time-varying condition of each channel (e.g., as a result of random fading or certain primary users' activities) is modeled as an arbitrary finite-state Markov chain. At each instance of time, a (secondary) user probes a channel and collects a certain reward as a function of the state of the channel (e.g., good channel condition results in higher data rate for the user). Each channel has potentially different state space and statistics, both unknown to the user, who tries to learn which one is the best as it goes and maximizes its usage of the best channel. The objective is to construct a good online learning algorithm so as to minimize the difference between the user's performance in total rewards and that of using the best channel (on average) had it known which one is the best from a priori knowledge of the channel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Smart Grid Energy Management
