Structure and Optimality of Myopic Policy in Opportunistic Access with Noisy Observations
Qing Zhao, Bhaskar Krishnamachari

TL;DR
This paper analyzes a restless multi-armed bandit problem in multichannel communications with noisy observations, establishing a simple, semi-universal myopic policy structure that is optimal for two channels and conjectured for more.
Contribution
It introduces a simple, semi-universal myopic policy structure for noisy multi-channel access that does not require knowledge of transition probabilities and proves its optimality for two channels.
Findings
Myopic policy has a semi-universal structure under certain conditions.
Optimality of the myopic policy is proven for two channels.
Numerical examples support the conjecture for more than two channels.
Abstract
A restless multi-armed bandit problem that arises in multichannel opportunistic communications is considered, where channels are modeled as independent and identical Gilbert-Elliot channels and channel state observations are subject to errors. A simple structure of the myopic policy is established under a certain condition on the false alarm probability of the channel state detector. It is shown that the myopic policy has a semi-universal structure that reduces channel selection to a simple round-robin procedure and obviates the need to know the underlying Markov transition probabilities. The optimality of the myopic policy is proved for the case of two channels and conjectured for the general case based on numerical examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Radio Networks and Spectrum Sensing · Advanced Bandit Algorithms Research · Advanced MIMO Systems Optimization
