TL;DR
This paper investigates decentralized wireless networks using Multi-Armed Bandits for dynamic channel selection and power control, demonstrating that sequential learning strategies can achieve fairness and reduce throughput variability in adversarial environments.
Contribution
It compares different MAB algorithms in decentralized wireless settings and shows that sequential learning improves stability and fairness under adversarial conditions.
Findings
Optimal proportional fairness is achievable without neighbor information.
Sequential learning reduces throughput variability in adversarial settings.
UCB and Thompson sampling outperform ε-greedy and EXP3 in stability.
Abstract
Next-generation wireless deployments are characterized by being dense and uncoordinated, which often leads to inefficient use of resources and poor performance. To solve this, we envision the utilization of completely decentralized mechanisms to enable Spatial Reuse (SR). In particular, we focus on dynamic channel selection and Transmission Power Control (TPC). We rely on Reinforcement Learning (RL), and more specifically on Multi-Armed Bandits (MABs), to allow networks to learn their best configuration. In this work, we study the exploration-exploitation trade-off by means of the -greedy, EXP3, UCB and Thompson sampling action-selection, and compare their performance. In addition, we study the implications of selecting actions simultaneously in an adversarial setting (i.e., concurrently), and compare it with a sequential approach. Our results show that optimal proportional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
