# Deep reinforcement learning framework for joint optimization of multi-RAT UAV location and user association in heterogeneous networks

**Authors:** Mohamed G. Anany, Mahmoud M. Elmesalawy, Ahmed M. Abd El-Haleem, Ibrahim I. Ibrahim

PMC · DOI: 10.1038/s41598-025-22610-1 · 2025-11-07

## TL;DR

This paper proposes a deep reinforcement learning framework to optimize UAV placement and user association in 6G networks, improving performance metrics like data rate and energy efficiency.

## Contribution

A novel DRL-based framework for joint optimization of UAV location and user association in multi-RAT HetNets is introduced.

## Key findings

- The proposed framework improves satisfaction index by 13% and downlink data rate by 25%.
- Uplink power consumption and outage probability are reduced by 67% and 71%, respectively.
- Regret learning and modified K-means enhance the efficiency and fairness of the system.

## Abstract

The explosive growth of multimedia and Internet of Thing (IoT) devices has led to a huge increase in data traffic requirements with a reduced power consumption demands in 6G communications. In this work, a ground Multiple Radio Access Technology (Multi-RAT) Heterogeneous Network (HetNet) is considered, which is assisted by multiple UAVs, each carrying Multi-RAT base stations (i.e., LTE and Wi-Fi base stations), to utilize the unlicensed spectrum, and provide an on-demand assistance, more capacity, and coverage for diverse ground devices. A Satisfaction to Energy Ratio (SER) is introduced, which is a ratio between the users’ satisfaction according to their requirements, and the UAVs’ energy consumption. An iterative framework is proposed to maximize the SER by optimizing the UAVs 3D location and the users association. The proposed framework uses a modified K-means algorithm for initialization, Deep Reinforcement Learning (DRL) to optimize the 3D location of UAVs, and regret learning to optimize the user association. Extensive simulations show an improvement percentage that reaches 13%, 25%, 67%, 71%, 28%, 45% in satisfaction index, downlink data rate, uplink power consumption, outage probability, Jain’s fairness index, and framework iterations, respectively. In addition, a comparison between different DRL algorithms, observation scenarios, and training approaches is presented to select the best combination of them in the proposed framework.

## Full-text entities

- **Genes:** Wap (whey acidic protein) [NCBI Gene 114596]
- **Diseases:** DDQN (MESH:D005671), GDs (MESH:D009471), GD (MESH:D005776)
- **Chemicals:** DDPG (-)

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12594865/full.md

---
Source: https://tomesphere.com/paper/PMC12594865