ReaCritic: Reasoning Transformer-based DRL Critic-model Scaling For Wireless Networks

Feiran You; Hongyang Du

arXiv:2505.10992·cs.LG·February 19, 2026

ReaCritic: Reasoning Transformer-based DRL Critic-model Scaling For Wireless Networks

Feiran You, Hongyang Du

PDF

Open Access

TL;DR

ReaCritic introduces a transformer-based critic model that incorporates reasoning capabilities into DRL, significantly improving convergence speed and performance in wireless network management and control tasks.

Contribution

It presents ReaCritic, a novel reasoning transformer-based critic model that enhances DRL adaptability and generalization in complex, dynamic wireless environments.

Findings

01

Improves convergence speed in HetNet scenarios

02

Enhances final performance across various tasks

03

Compatible with multiple DRL algorithms

Abstract

Heterogeneous Networks (HetNets) pose critical challenges for intelligent management due to the diverse user requirements and time-varying wireless conditions. These factors introduce significant decision complexity, which limits the adaptability of existing Deep Reinforcement Learning (DRL) methods. In many DRL algorithms, especially those involving value-based or actor-critic structures, the critic component plays a key role in guiding policy learning by estimating value functions. However, conventional critic models often use shallow architectures that map observations directly to scalar estimates, limiting their ability to handle multi-task complexity. In contrast, recent progress in inference-time scaling of Large Language Models (LLMs) has shown that generating intermediate reasoning steps can significantly improve decision quality. Motivated by this, we propose ReaCritic, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware-Defined Networks and 5G · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings