Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation
Lingfeng Zhang, Yuchen Zhang, Hongsheng Li, Haoxiang Fu, Yingbo Tang, Hangjun Ye, Long Chen, Xiaojun Liang, Xiaoshuai Hao, Wenbo Ding

TL;DR
This paper introduces SpatialSky-Bench, a comprehensive benchmark for evaluating the spatial intelligence of Vision-Language Models in UAV navigation, revealing current limitations and proposing a new specialized VLM, Sky-VLM, that achieves state-of-the-art results.
Contribution
The paper presents a new benchmark, SpatialSky-Bench, and a large dataset, SpatialSky-Dataset, along with a specialized VLM, Sky-VLM, tailored for UAV spatial reasoning tasks.
Findings
Mainstream VLMs perform poorly in complex UAV scenarios.
Sky-VLM outperforms existing models across all benchmark tasks.
The benchmark reveals significant gaps in current VLM spatial capabilities.
Abstract
Vision-Language Models (VLMs), leveraging their powerful visual perception and reasoning capabilities, have been widely applied in Unmanned Aerial Vehicle (UAV) tasks. However, the spatial intelligence capabilities of existing VLMs in UAV scenarios remain largely unexplored, raising concerns about their effectiveness in navigating and interpreting dynamic environments. To bridge this gap, we introduce SpatialSky-Bench, a comprehensive benchmark specifically designed to evaluate the spatial intelligence capabilities of VLMs in UAV navigation. Our benchmark comprises two categories-Environmental Perception and Scene Understanding-divided into 13 subcategories, including bounding boxes, color, distance, height, and landing safety analysis, among others. Extensive evaluations of various mainstream open-source and closed-source VLMs reveal unsatisfactory performance in complex UAV navigation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
