Learning-Based Channel Access in Wi-Fi: A Multi-Armed Bandit Approach

Miguel Casasnovas; Francesc Wilhelmi; Richard Combes; Maksymilian Wojnar; Katarzyna Kosek-Szott; Szymon Szott; Anders Jonsson; Luis Esteve; Boris Bellalta

arXiv:2511.10143·cs.NI·November 14, 2025

Learning-Based Channel Access in Wi-Fi: A Multi-Armed Bandit Approach

Miguel Casasnovas, Francesc Wilhelmi, Richard Combes, Maksymilian Wojnar, Katarzyna Kosek-Szott, Szymon Szott, Anders Jonsson, Luis Esteve, Boris Bellalta

PDF

Open Access

TL;DR

This paper explores reinforcement learning, specifically multi-armed bandit algorithms, to improve Wi-Fi channel access by making it more adaptive to dynamic network conditions, leading to better spectrum utilization.

Contribution

It introduces a multi-armed bandit framework for Wi-Fi medium access control, comparing single-agent and multi-agent approaches, and highlights the benefits of contextual information and decentralized learning.

Findings

01

Multi-agent architectures converge faster than single-agent ones.

02

Contextual bandit algorithms outperform non-contextual methods.

03

Decentralized learners can implicitly coordinate but may cause policy-chasing dynamics.

Abstract

Due to its static protocol design, IEEE 802.11 (aka Wi-Fi) channel access lacks adaptability to address dynamic network conditions, resulting in inefficient spectrum utilization, unnecessary contention, and packet collisions. This paper investigates reinforcement learning (RL) solutions to optimize Wi-Fi's medium access control (MAC). In particular, a multi-armed bandit (MAB) framework is proposed for dynamic channel access (including both the primary channel and channel width) and contention window (CW) adjustment. In this setting, we study relevant learning design principles such as adopting joint or factorial action spaces (handled by a single agent (SA) and multiple agents (MA), respectively) and the importance of incorporating contextual information. Our simulation results show that cooperative MA architectures converge faster than their SA counterparts, as agents operate over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Networks and Protocols · Advanced Bandit Algorithms Research · Advanced Wireless Network Optimization