UEMM-Air: Make Unmanned Aerial Vehicles Perform More Multi-modal Tasks
Liang Yao, Fan Liu, Shengxiang Xu, Chuanyi Zhang, Xing Ma, Jianyu, Jiang, Zequan Wang, Shimin Di, Jun Zhou

TL;DR
UEMM-Air is a synthetic, multi-modal UAV dataset created using Unreal Engine, designed to facilitate multi-task learning with precise annotations and improve model performance on UAV-related applications.
Contribution
We introduce UEMM-Air, a large synthetic multi-modal UAV dataset with automatic annotations and cross-modality support, addressing limitations of existing datasets.
Findings
Models trained on UEMM-Air perform better on downstream tasks.
UEMM-Air supports multiple modalities and tasks with high-quality annotations.
Benchmark results demonstrate the dataset's effectiveness for UAV multi-modal learning.
Abstract
The development of multi-modal learning for Unmanned Aerial Vehicles (UAVs) typically relies on a large amount of pixel-aligned multi-modal image data. However, existing datasets face challenges such as limited modalities, high construction costs, and imprecise annotations. To this end, we propose a synthetic multi-modal UAV-based multi-task dataset, UEMM-Air. Specifically, we simulate various UAV flight scenarios and object types using the Unreal Engine (UE). Then we design the UAV's flight logic to automatically collect data from different scenarios, perspectives, and altitudes. Furthermore, we propose a novel heuristic automatic annotation algorithm to generate accurate object detection labels. Finally, we utilize labels to generate text descriptions of images to make our UEMM-Air support more cross-modality tasks. In total, our UEMM-Air consists of 120k pairs of images with 6…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Infrared Target Detection Methodologies · Advanced Image and Video Retrieval Techniques
