$\alpha^3$-SecBench: A Large-Scale Evaluation Suite of Security, Resilience, and Trust for LLM-based UAV Agents over 6G Networks

Mohamed Amine Ferrag; Abderrahmane Lakas; Merouane Debbah

arXiv:2601.18754·cs.CR·January 27, 2026

$\alpha^3$-SecBench: A Large-Scale Evaluation Suite of Security, Resilience, and Trust for LLM-based UAV Agents over 6G Networks

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

PDF

Open Access

TL;DR

This paper introduces $oldsymbol{ ext{ extalpha}^3}$-SecBench, a comprehensive evaluation suite for testing the security, resilience, and trustworthiness of LLM-based UAV agents in adversarial 6G network environments, revealing significant gaps in current model capabilities.

Contribution

It presents the first large-scale benchmark for assessing security and resilience of LLM-based UAVs under realistic adversarial scenarios, covering seven autonomy layers and multiple threat types.

Findings

01

Many models detect anomalies reliably but struggle with mitigation and vulnerability attribution.

02

Overall security scores vary widely, from 12.9% to 57.1%.

03

Significant gap exists between anomaly detection and autonomous security decision-making.

Abstract

Autonomous unmanned aerial vehicle (UAV) systems are increasingly deployed in safety-critical, networked environments where they must operate reliably in the presence of malicious adversaries. While recent benchmarks have evaluated large language model (LLM)-based UAV agents in reasoning, navigation, and efficiency, systematic assessment of security, resilience, and trust under adversarial conditions remains largely unexplored, particularly in emerging 6G-enabled settings. We introduce $α^{3}$ -SecBench, the first large-scale evaluation suite for assessing the security-aware autonomy of LLM-based UAV agents under realistic adversarial interference. Building on multi-turn conversational UAV missions from $α^{3}$ -Bench, the framework augments benign episodes with 20,000 validated security overlay attack scenarios targeting seven autonomy layers, including sensing, perception,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsUAV Applications and Optimization · Adversarial Robustness in Machine Learning · Air Traffic Management and Optimization