UAVScenes: A Multi-Modal Dataset for UAVs

Sijie Wang; Siqi Li; Yawei Zhang; Shangshu Yu; Shenghai Yuan; Rui She; Quanjiang Guo; JinXuan Zheng; Ong Kang Howe; Leonrich Chandra; Shrivarshann Srijeyan; Aditya Sivadas; Toshan Aggarwal; Heyuan Liu; Hongming Zhang; Chujie Chen; Junyu Jiang; Lihua Xie; Wee Peng Tay

arXiv:2507.22412·cs.CV·July 31, 2025

UAVScenes: A Multi-Modal Dataset for UAVs

Sijie Wang, Siqi Li, Yawei Zhang, Shangshu Yu, Shenghai Yuan, Rui She, Quanjiang Guo, JinXuan Zheng, Ong Kang Howe, Leonrich Chandra, Shrivarshann Srijeyan, Aditya Sivadas, Toshan Aggarwal, Heyuan Liu, Hongming Zhang, Chujie Chen, Junyu Jiang, Lihua Xie, Wee Peng Tay

PDF

1 Datasets

TL;DR

UAVScenes is a comprehensive multi-modal UAV dataset with frame-wise semantic annotations for images and LiDAR, enabling advanced perception tasks like segmentation, localization, and view synthesis.

Contribution

The paper introduces UAVScenes, a large-scale, multi-modal UAV dataset with detailed annotations for diverse high-level perception tasks, filling a critical gap in existing UAV datasets.

Findings

01

Provides annotations for both images and LiDAR point clouds

02

Enables benchmarking of multiple perception tasks

03

Supports high-level scene understanding in UAV applications

Abstract

Multi-modal perception is essential for unmanned aerial vehicle (UAV) operations, as it enables a comprehensive understanding of the UAVs' surrounding environment. However, most existing multi-modal UAV datasets are primarily biased toward localization and 3D reconstruction tasks, or only support map-level semantic segmentation due to the lack of frame-wise annotations for both camera images and LiDAR point clouds. This limitation prevents them from being used for high-level scene understanding tasks. To address this gap and advance multi-modal UAV perception, we introduce UAVScenes, a large-scale dataset designed to benchmark various tasks across both 2D and 3D modalities. Our benchmark dataset is built upon the well-calibrated multi-modal UAV dataset MARS-LVIG, originally developed only for simultaneous localization and mapping (SLAM). We enhance this dataset by providing manually…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

sijieaaa/UAVScenes
dataset· 51 dl
51 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.