MM-UAVBench: How Well Do Multimodal Large Language Models See, Think, and Plan in Low-Altitude UAV Scenarios?

Shiqi Dai; Zizhi Ma; Zhicong Luo; Xuesong Yang; Yibin Huang; Wanyue Zhang; Chi Chen; Zonghao Guo; Wang Xu; Yufei Sun; Maosong Sun

arXiv:2512.23219·cs.CV·December 30, 2025

MM-UAVBench: How Well Do Multimodal Large Language Models See, Think, and Plan in Low-Altitude UAV Scenarios?

Shiqi Dai, Zizhi Ma, Zhicong Luo, Xuesong Yang, Yibin Huang, Wanyue Zhang, Chi Chen, Zonghao Guo, Wang Xu, Yufei Sun, Maosong Sun

PDF

Open Access 1 Datasets

TL;DR

This paper introduces MM-UAVBench, a comprehensive benchmark for evaluating multimodal large language models in low-altitude UAV scenarios, highlighting current models' limitations and guiding future research.

Contribution

The paper presents MM-UAVBench, the first benchmark systematically assessing MLLMs' perception, cognition, and planning in UAV contexts with real-world data.

Findings

01

Current MLLMs struggle with complex UAV visual and cognitive tasks.

02

Identified bottlenecks include spatial bias and multi-view understanding.

03

Benchmark facilitates targeted improvements in UAV-related MLLMs.

Abstract

While Multimodal Large Language Models (MLLMs) have exhibited remarkable general intelligence across diverse domains, their potential in low-altitude applications dominated by Unmanned Aerial Vehicles (UAVs) remains largely underexplored. Existing MLLM benchmarks rarely cover the unique challenges of low-altitude scenarios, while UAV-related evaluations mainly focus on specific tasks such as localization or navigation, without a unified evaluation of MLLMs'general intelligence. To bridge this gap, we present MM-UAVBench, a comprehensive benchmark that systematically evaluates MLLMs across three core capability dimensions-perception, cognition, and planning-in low-altitude UAV scenarios. MM-UAVBench comprises 19 sub-tasks with over 5.7K manually annotated questions, all derived from real-world UAV data collected from public datasets. Extensive experiments on 16 open-source and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

daisq/MM-UAVBench
dataset· 7.8k dl
7.8k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · UAV Applications and Optimization