Recent Advances in Transformer and Large Language Models for UAV Applications

Hamza Kheddar; Yassine Habchi; Mohamed Chahine Ghanem; Mustapha Hemis; Dusit Niyato

arXiv:2508.11834·cs.CV·August 19, 2025

Recent Advances in Transformer and Large Language Models for UAV Applications

Hamza Kheddar, Yassine Habchi, Mohamed Chahine Ghanem, Mustapha Hemis, Dusit Niyato

PDF

TL;DR

This paper reviews recent Transformer-based models for UAVs, categorizing architectures, applications, and benchmarks, while highlighting challenges and future directions for improving UAV perception and autonomy.

Contribution

It provides a unified taxonomy of Transformer models for UAVs, compares recent developments, and discusses key datasets, challenges, and future research avenues.

Findings

01

Transformer architectures enhance UAV perception and decision-making.

02

Emerging applications include precision agriculture and autonomous navigation.

03

Identifies challenges in computational efficiency and real-time deployment.

Abstract

The rapid advancement of Transformer-based models has reshaped the landscape of uncrewed aerial vehicle (UAV) systems by enhancing perception, decision-making, and autonomy. This review paper systematically categorizes and evaluates recent developments in Transformer architectures applied to UAVs, including attention mechanisms, CNN-Transformer hybrids, reinforcement learning Transformers, and large language models (LLMs). Unlike previous surveys, this work presents a unified taxonomy of Transformer-based UAV models, highlights emerging applications such as precision agriculture and autonomous navigation, and provides comparative analyses through structured tables and performance benchmarks. The paper also reviews key datasets, simulators, and evaluation metrics used in the field. Furthermore, it identifies existing gaps in the literature, outlines critical challenges in computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.