UAVBench: An Open Benchmark Dataset for Autonomous and Agentic AI UAV Systems via LLM-Generated Flight Scenarios
Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

TL;DR
UAVBench provides a comprehensive, standardized dataset and benchmark for evaluating the reasoning and decision-making capabilities of large language models in autonomous UAV systems, addressing a critical gap in systematic assessment tools.
Contribution
This work introduces UAVBench, a large, validated dataset of UAV flight scenarios and a reasoning-oriented extension, enabling detailed evaluation of LLMs in UAV-specific tasks and ethics-aware decision-making.
Findings
Strong performance in perception and policy reasoning by LLMs
Persistent challenges in ethics-aware decision-making
Resource constraints impact LLM reasoning abilities
Abstract
Autonomous aerial systems increasingly rely on large language models (LLMs) for mission planning, perception, and decision-making, yet the lack of standardized and physically grounded benchmarks limits systematic evaluation of their reasoning capabilities. To address this gap, we introduce UAVBench, an open benchmark dataset comprising 50,000 validated UAV flight scenarios generated through taxonomy-guided LLM prompting and multi-stage safety validation. Each scenario is encoded in a structured JSON schema that includes mission objectives, vehicle configuration, environmental conditions, and quantitative risk labels, providing a unified representation of UAV operations across diverse domains. Building on this foundation, we present UAVBench_MCQ, a reasoning-oriented extension containing 50,000 multiple-choice questions spanning ten cognitive and ethical reasoning styles, ranging from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAir Traffic Management and Optimization · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
