Selecting Offline Reinforcement Learning Algorithms for Stochastic Network Control

Nicolas Helson; Pegah Alizadeh; Anastasios Giovanidis

arXiv:2603.03932·cs.NI·March 5, 2026

Selecting Offline Reinforcement Learning Algorithms for Stochastic Network Control

Nicolas Helson, Pegah Alizadeh, Anastasios Giovanidis

PDF

Open Access

TL;DR

This paper evaluates various offline RL algorithms for stochastic wireless network control, highlighting Conservative Q-Learning's robustness and providing practical guidance for AI-driven network management.

Contribution

It offers a comprehensive evaluation of offline RL methods in stochastic telecom environments, identifying Conservative Q-Learning as the most robust approach.

Findings

01

Conservative Q-Learning outperforms other methods in stochastic settings.

02

Sequence-based methods excel with abundant high-return trajectories.

03

The study provides practical guidance for offline RL algorithm selection in network control.

Abstract

Offline Reinforcement Learning (RL) is a promising approach for next-generation wireless networks, where online exploration is unsafe and large amounts of operational data can be reused across the model lifecycle. However, the behavior of offline RL algorithms under genuinely stochastic dynamics -- inherent to wireless systems due to fading, noise, and traffic mobility -- remains insufficiently understood. We address this gap by evaluating Bellman-based (Conservative Q-Learning), sequence-based (Decision Transformers), and hybrid (Critic-Guided Decision Transformers) offline RL methods in an open-access stochastic telecom environment (mobile-env). Our results show that Conservative Q-Learning consistently produces more robust policies across different sources of stochasticity, making it a reliable default choice in lifecycle-driven AI management frameworks. Sequence-based methods remain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced MIMO Systems Optimization · Reinforcement Learning in Robotics · Software-Defined Networks and 5G