Post-Decision State-Based Online Learning for Delay-Energy-Aware Flow Allocation in Wireless Systems
Mahesh Ganesh Bhat, Shana Moothedath, and Prasanna Chaporkar

TL;DR
This paper presents a structure-aware reinforcement learning method using post-decision states for efficient delay- and energy-aware flow allocation in 5G wireless systems, improving convergence speed and resource management.
Contribution
It introduces a PDS-based value iteration algorithm that leverages MDP structure to enhance learning efficiency without prior knowledge of system dynamics.
Findings
Faster convergence compared to standard Q-learning
Lower long-term cost in resource allocation
Effective in heterogeneous 5G UPFs
Abstract
We develop a structure-aware reinforcement learning (RL) approach for delay- and energy-aware flow allocation in 5G User Plane Functions (UPFs). We consider a dynamic system with heterogeneous UPFs of varying capacities that handle stochastic arrivals of flow types, each with distinct rate requirements. We model the system as a Markov decision process (MDP) to capture the stochastic nature of flow arrivals and departures (possibly unknown), as well as the impact of flow allocation in the system. To solve this problem, we propose a post-decision state (PDS) based value iteration algorithm that exploits the underlying structure of the MDP. By separating action-controlled dynamics from exogenous factors, PDS enables faster convergence and efficient adaptive flow allocation, even in the absence of statistical knowledge about exogenous variables. Simulation results demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced MIMO Systems Optimization · Wireless Networks and Protocols · Advanced Wireless Network Optimization
