$\alpha^3$-Bench: A Unified Benchmark of Safety, Robustness, and Efficiency for LLM-Based UAV Agents over 6G Networks
Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

TL;DR
This paper introduces $oldsymbol{ extalpha^3}$-Bench, a comprehensive benchmark for evaluating the safety, robustness, and efficiency of LLM-based UAV agents operating over dynamic 6G networks, addressing a gap in realistic autonomous UAV assessments.
Contribution
It presents a novel multi-turn conversational benchmark with a large dataset, a composite evaluation metric, and insights into model performance under network variability.
Findings
Models achieve high success and safety but vary in robustness.
Network conditions significantly impact model efficiency.
The benchmark enables comprehensive evaluation of LLM UAV agents.
Abstract
Large Language Models (LLMs) are increasingly used as high level controllers for autonomous Unmanned Aerial Vehicle (UAV) missions. However, existing evaluations rarely assess whether such agents remain safe, protocol compliant, and effective under realistic next generation networking constraints. This paper introduces -Bench, a benchmark for evaluating LLM driven UAV autonomy as a multi turn conversational reasoning and control problem operating under dynamic 6G conditions. Each mission is formulated as a language mediated control loop between an LLM based UAV agent and a human operator, where decisions must satisfy strict schema validity, mission policies, speaker alternation, and safety constraints while adapting to fluctuating network slices, latency, jitter, packet loss, throughput, and edge load variations. To reflect modern agentic workflows, -Bench…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUAV Applications and Optimization · Advanced Neural Network Applications · Software-Defined Networks and 5G
