Enabling Safety-Critical Wireless Communications via Safe Reinforcement Learning
Haoran Peng, Tong Wu, Hang Liu, Weijia Zheng, Ying-Jun Angela Zhang, and Anna Scaglione

TL;DR
This paper introduces Safe-Deep Q-Learning, a novel reinforcement learning algorithm designed to ensure safety constraints are strictly met in critical wireless communication systems, demonstrated on UAV and emergency networks.
Contribution
The paper presents a new safe RL algorithm that handles complex constraints and stochastic dynamics, with proven convergence and robustness for safety-critical wireless applications.
Findings
Achieves near-zero safety violations in UAV swarm control.
Outperforms existing constrained RL methods in safety adherence.
Demonstrates effectiveness in post-disaster emergency communication scenarios.
Abstract
Ensuring strict safety guarantees is the paramount challenge for emerging 5G/6G wireless systems, particularly as they increasingly govern mission-critical applications ranging from autonomous UAV swarms to industrial automation. While deep reinforcement learning (DRL) offers a promising solution for complex resource allocation, standard algorithms frequently violate essential constraints, such as QoS mandates and power limits, posing unacceptable risks of system failure and regulatory non-compliance. We propose Safe-Deep Q-Learning, a novel algorithm that simultaneously addresses all three challenges: it handles mixed-integer nonconvex problems by approximating the Q-function, adapts to stochastic dynamics, and enforces dual-timescale constraints using integrated Lagrangian methods. Our framework features adaptive penalty scaling and constraint violation tracking, specifically tailored…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
