Distributed-Training-and-Execution Multi-Agent Reinforcement Learning for Power Control in HetNet
Kaidi Xu, Nguyen Van Huynh, Geoffrey Ye Li

TL;DR
This paper introduces a multi-agent deep reinforcement learning approach with a penalty-based Q learning algorithm for power control in HetNets, enabling decentralized decision-making and improved collaboration without requiring global information.
Contribution
The paper proposes a novel penalty-based Q learning algorithm for MADRL that enhances cooperation and efficiency in distributed power control for HetNets.
Findings
PQL outperforms existing MADRL algorithms in dynamic environments.
The method achieves better power control policies with less computational complexity.
Agents learn more effective cooperation strategies through regularized policy updates.
Abstract
In heterogeneous networks (HetNets), the overlap of small cells and the macro cell causes severe cross-tier interference. Although there exist some approaches to address this problem, they usually require global channel state information, which is hard to obtain in practice, and get the sub-optimal power allocation policy with high computational complexity. To overcome these limitations, we propose a multi-agent deep reinforcement learning (MADRL) based power control scheme for the HetNet, where each access point makes power control decisions independently based on local information. To promote cooperation among agents, we develop a penalty-based Q learning (PQL) algorithm for MADRL systems. By introducing regularization terms in the loss function, each agent tends to choose an experienced action with high reward when revisiting a state, and thus the policy updating speed slows down. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced MIMO Systems Optimization · Energy Harvesting in Wireless Networks · Wireless Body Area Networks
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
