Two-Layer Reinforcement Learning-Assisted Joint Beamforming and Trajectory Optimization for Multi-UAV Downlink Communications

Ruiqi Wang; Essra M.Ghoura; Omar Alhussein; Yuzhi Yang; Jing Ren; Shizhong Xu; Sami Muhaidat

arXiv:2601.12659·eess.SP·April 20, 2026

Two-Layer Reinforcement Learning-Assisted Joint Beamforming and Trajectory Optimization for Multi-UAV Downlink Communications

Ruiqi Wang, Essra M.Ghoura, Omar Alhussein, Yuzhi Yang, Jing Ren, Shizhong Xu, Sami Muhaidat

PDF

TL;DR

This paper introduces a hierarchical framework combining graph neural networks and multi-agent reinforcement learning to optimize UAV beamforming and trajectories efficiently for 6G networks.

Contribution

It presents a novel decoupled approach that models interference and trajectory planning separately, enabling real-time adaptation and improved performance.

Findings

01

Achieves sub-millisecond inference for interference patterns.

02

Outperforms existing heuristics and deep learning methods in sum rate and convergence.

03

Demonstrates superior generalization in dynamic UAV communication scenarios.

Abstract

Unmanned aerial vehicles (UAVs) are pivotal for future 6G non-terrestrial networks, yet their high mobility creates a complex coupled optimization problem for beamforming and trajectory design. Existing numerical methods suffer from prohibitive latency, while standard deep learning often ignores dynamic interference topology, limiting scalability. To address these issues, this paper proposes a hierarchically decoupled framework synergizing graph neural networks (GNNs) with multi-agent reinforcement learning. Specifically, on the short timescale, we develop a topology-aware GNN beamformer by incorporating GraphNorm. By modeling the dynamic UAV-user association as a time-varying heterogeneous graph, this method explicitly extracts interference patterns to achieve sub-millisecond inference. On the long timescale, trajectory planning is modeled as a decentralized partially observable Markov…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.