AirCopBench: A Benchmark for Multi-drone Collaborative Embodied Perception and Reasoning

Jirong Zha; Yuxuan Fan; Tianyu Zhang; Geng Chen; Yingfeng Chen; Chen Gao; Xinlei Chen

arXiv:2511.11025·cs.CV·November 25, 2025

AirCopBench: A Benchmark for Multi-drone Collaborative Embodied Perception and Reasoning

Jirong Zha, Yuxuan Fan, Tianyu Zhang, Geng Chen, Yingfeng Chen, Chen Gao, Xinlei Chen

PDF

Open Access 1 Video

TL;DR

AirCopBench is a new comprehensive benchmark for evaluating multimodal large language models in multi-drone collaborative perception, addressing real-world challenges and revealing significant performance gaps.

Contribution

This work introduces AirCopBench, the first benchmark for multi-drone embodied perception, including diverse tasks, real-world data, and evaluation of model transferability under degraded conditions.

Findings

01

Significant performance gaps between models and humans in collaborative perception tasks.

02

Fine-tuning improves model performance and sim-to-real transfer feasibility.

03

Benchmark covers diverse tasks and real-world degraded perception scenarios.

Abstract

Multimodal Large Language Models (MLLMs) have shown promise in single-agent vision tasks, yet benchmarks for evaluating multi-agent collaborative perception remain scarce. This gap is critical, as multi-drone systems provide enhanced coverage, robustness, and collaboration compared to single-sensor setups. Existing multi-image benchmarks mainly target basic perception tasks using high-quality single-agent images, thus failing to evaluate MLLMs in more complex, egocentric collaborative scenarios, especially under real-world degraded perception conditions.To address these challenges, we introduce AirCopBench, the first comprehensive benchmark designed to evaluate MLLMs in embodied aerial collaborative perception under challenging perceptual conditions. AirCopBench includes 14.6k+ questions derived from both simulator and real-world data, spanning four key task dimensions: Scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AirCopBench: A Benchmark for Multi-drone Collaborative Embodied Perception and Reasoning· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications