Deep Reinforcement Learning with Symmetric Prior for Predictive Power Allocation to Mobile Users
Jianyu Zhao, Chenyang Yang

TL;DR
This paper introduces a symmetric prior in deep reinforcement learning for power allocation in mobile video streaming, significantly reducing training complexity and model size while maintaining performance.
Contribution
It proposes a symmetric prior-based neural network design for DDPG, reducing sampling complexity and model size in wireless resource allocation tasks.
Findings
Model parameters compressed by 2/K^2
Training episodes reduced by about one third for K=10
Maintains performance comparable to vanilla DDPG
Abstract
Deep reinforcement learning has been applied for a variety of wireless tasks, which is however known with high training and inference complexity. In this paper, we resort to deep deterministic policy gradient (DDPG) algorithm to optimize predictive power allocation among K mobile users requesting video streaming, which minimizes the energy consumption of the network under the no-stalling constraint of each user. To reduce the sampling complexity and model size of the DDPG, we exploit a kind of symmetric prior inherent in the actor and critic networks: permutation invariant and equivariant properties, to design the neural networks. Our analysis shows that the free model parameters of the DDPG can be compressed by 2/K^2. Simulation results demonstrate that the episodes required by the learning model with the symmetric prior to achieve the same performance as the vanilla policy reduces by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGreen IT and Sustainability · Advanced MIMO Systems Optimization · Energy Harvesting in Wireless Networks
MethodsWeight Decay · Adam · Dense Connections · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Experience Replay · Batch Normalization · Deep Deterministic Policy Gradient
