Dynamics of Resource Allocation in O-RANs: An In-depth Exploration of On-Policy and Off-Policy Deep Reinforcement Learning for Real-Time Applications
Manal Mehdaoui, Amine Abouaomar

TL;DR
This paper compares on-policy and off-policy deep reinforcement learning models, PPO and ACER, for resource allocation in O-RAN, validating their performance and insights for latency-sensitive applications.
Contribution
It provides a replication study validating the effectiveness of PPO and ACER in O-RAN resource management, emphasizing their performance differences and practical implications.
Findings
Both models outperform greedy algorithms in O-RAN.
PPO balances energy use and latency effectively.
ACER converges faster in resource allocation tasks.
Abstract
Deep Reinforcement Learning (DRL) is a powerful tool used for addressing complex challenges in mobile networks. This paper investigates the application of two DRL models, on-policy and off-policy, in the field of resource allocation for Open Radio Access Networks (O-RAN). The on-policy model is the Proximal Policy Optimization (PPO), and the off-policy model is the Sample Efficient Actor-Critic with Experience Replay (ACER), which focuses on resolving the challenges of resource allocation associated with a Quality of Service (QoS) application that has strict requirements. Motivated by the original work of Nessrine Hammami and Kim Khoa Nguyen, this study is a replication to validate and prove the findings. Both PPO and ACER are used within the same experimental setup to assess their performance in a scenario of latency-sensitive and latency-tolerant users and compare them. The aim is to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · Energy Efficient Wireless Sensor Networks · Energy Harvesting in Wireless Networks
Methods*Communicated@Fast*How Do I Communicate to Expedia? · travel james · Retrace · Softmax · Convolution · Experience Replay · Entropy Regularization · Dense Connections · Proximal Policy Optimization · Trust Region Policy Optimization
