Deep Reinforcement Learning for Uplink Multi-Carrier Non-Orthogonal Multiple Access Resource Allocation Using Buffer State Information
Eike-Manuel Bansbach, Yigit Kiyak, Laurent Schmalen

TL;DR
This paper introduces a deep reinforcement learning-based scheduler for uplink multi-carrier NOMA systems that leverages buffer state information to optimize resource allocation, outperforming traditional benchmarks.
Contribution
It presents a novel actor-critic reinforcement learning scheduler that effectively incorporates buffer state information for uplink NOMA resource allocation.
Findings
The proposed scheduler outperforms benchmark schedulers in simulations.
Buffer state information improves scheduling efficiency.
Novel techniques stabilize and accelerate RL training.
Abstract
For orthogonal multiple access (OMA) systems, the number of served user equipments (UEs) is limited to the number of available orthogonal resources. On the other hand, non-orthogonal multiple access (NOMA) schemes allow multiple UEs to use the same orthogonal resource. This extra degree of freedom introduces new challenges for resource allocation. Buffer state information (BSI), like the size and age of packets waiting for transmission, can be used to improve scheduling in OMA systems. In this paper, we investigate the impact of BSI on the performance of a centralized scheduler in an uplink multi-carrier NOMA scenario with UEs having various data rate and latency requirements. To handle the large combinatorial space of allocating UEs to the resources, we propose a novel scheduler based on actor-critic reinforcement learning incorporating BSI. Training and evaluation are carried out…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Wireless Communication Technologies · Age of Information Optimization · Energy Harvesting in Wireless Networks
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
