AirCopBench: A Benchmark for Multi-drone Collaborative Embodied Perception and Reasoning
Jirong Zha, Yuxuan Fan, Tianyu Zhang, Geng Chen, Yingfeng Chen, Chen Gao, Xinlei Chen

TL;DR
AirCopBench is a new comprehensive benchmark for evaluating multimodal large language models in multi-drone collaborative perception, addressing real-world challenges and revealing significant performance gaps.
Contribution
This work introduces AirCopBench, the first benchmark for multi-drone embodied perception, including diverse tasks, real-world data, and evaluation of model transferability under degraded conditions.
Findings
Significant performance gaps between models and humans in collaborative perception tasks.
Fine-tuning improves model performance and sim-to-real transfer feasibility.
Benchmark covers diverse tasks and real-world degraded perception scenarios.
Abstract
Multimodal Large Language Models (MLLMs) have shown promise in single-agent vision tasks, yet benchmarks for evaluating multi-agent collaborative perception remain scarce. This gap is critical, as multi-drone systems provide enhanced coverage, robustness, and collaboration compared to single-sensor setups. Existing multi-image benchmarks mainly target basic perception tasks using high-quality single-agent images, thus failing to evaluate MLLMs in more complex, egocentric collaborative scenarios, especially under real-world degraded perception conditions.To address these challenges, we introduce AirCopBench, the first comprehensive benchmark designed to evaluate MLLMs in embodied aerial collaborative perception under challenging perceptual conditions. AirCopBench includes 14.6k+ questions derived from both simulator and real-world data, spanning four key task dimensions: Scene…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
