UAVBench: An Open Benchmark Dataset for Autonomous and Agentic AI UAV Systems via LLM-Generated Flight Scenarios

Mohamed Amine Ferrag; Abderrahmane Lakas; Merouane Debbah

arXiv:2511.11252·cs.AI·November 17, 2025

UAVBench: An Open Benchmark Dataset for Autonomous and Agentic AI UAV Systems via LLM-Generated Flight Scenarios

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

PDF

Open Access

TL;DR

UAVBench provides a comprehensive, standardized dataset and benchmark for evaluating the reasoning and decision-making capabilities of large language models in autonomous UAV systems, addressing a critical gap in systematic assessment tools.

Contribution

This work introduces UAVBench, a large, validated dataset of UAV flight scenarios and a reasoning-oriented extension, enabling detailed evaluation of LLMs in UAV-specific tasks and ethics-aware decision-making.

Findings

01

Strong performance in perception and policy reasoning by LLMs

02

Persistent challenges in ethics-aware decision-making

03

Resource constraints impact LLM reasoning abilities

Abstract

Autonomous aerial systems increasingly rely on large language models (LLMs) for mission planning, perception, and decision-making, yet the lack of standardized and physically grounded benchmarks limits systematic evaluation of their reasoning capabilities. To address this gap, we introduce UAVBench, an open benchmark dataset comprising 50,000 validated UAV flight scenarios generated through taxonomy-guided LLM prompting and multi-stage safety validation. Each scenario is encoded in a structured JSON schema that includes mission objectives, vehicle configuration, environmental conditions, and quantitative risk labels, providing a unified representation of UAV operations across diverse domains. Building on this foundation, we present UAVBench_MCQ, a reasoning-oriented extension containing 50,000 multiple-choice questions spanning ten cognitive and ethical reasoning styles, ranging from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAir Traffic Management and Optimization · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI