Practical Policy Distillation for Reinforcement Learning in Radio Access Networks

Sara Khosravi; Burak Demirel; Linghui Zhou; Javier Rasines; Pablo Soldati

arXiv:2511.06563·cs.LG·January 29, 2026

Practical Policy Distillation for Reinforcement Learning in Radio Access Networks

Sara Khosravi, Burak Demirel, Linghui Zhou, Javier Rasines, Pablo Soldati

PDF

Open Access

TL;DR

This paper explores policy distillation techniques to create compact, generalizable reinforcement learning models for radio access networks, addressing hardware constraints and maintaining performance across diverse network scenarios.

Contribution

It introduces single-policy and multi-policy distillation methods tailored for resource-limited RAN hardware, enabling deployment of effective RL models in legacy 4G and 5G systems.

Findings

01

Distilled models retain teacher performance in diverse scenarios

02

Both strategies produce models under 1MB with sub-100μs inference

03

Experimental results validate effectiveness in 5G-compliant simulators

Abstract

Adopting artificial intelligence (AI) in radio access networks (RANs) presents several challenges, including limited availability of link-level measurements (e.g., CQI reports), stringent real-time processing constraints (e.g., sub-1 ms per TTI), and network heterogeneity (different spectrum bands, cell types, and vendor equipment). A critical yet often overlooked barrier lies in the computational and memory limitations of RAN baseband hardware, particularly in legacy 4th Generation (4G) systems, which typically lack on-chip neural accelerators. As a result, only lightweight AI models (under 1 Mb and sub-100~\mu s inference time) can be effectively deployed, limiting both their performance and applicability. However, achieving strong generalization across diverse network conditions often requires large-scale models with substantial resource demands. To address this trade-off, this paper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced MIMO Systems Optimization · Software-Defined Networks and 5G · Wireless Signal Modulation Classification