Drone Referring Localization: An Efficient Heterogeneous Spatial Feature Interaction Method For UAV Self-Localization
Ming Dai, Enhui Zheng, Jiahao Chen, Lei Qi, Zhenhua Feng, Wankou Yang

TL;DR
This paper introduces a novel transformer-based heterogeneous feature interaction method called Drone Referring Localization (DRL) for UAV self-localization, improving accuracy and efficiency over traditional image retrieval approaches.
Contribution
The paper proposes the DRL framework with two transformer-based models, new data augmentation and loss techniques, and a new dataset UL14 for UAV self-localization.
Findings
DRL achieves +9.4% localization accuracy (MA@20)
DRL reduces computational time to 1/7
DRL decreases storage overhead to 1/3
Abstract
Image retrieval (IR) has emerged as a promising approach for self-localization in unmanned aerial vehicles (UAVs). However, IR-based methods face several challenges: 1) Pre- and post-processing incur significant computational and storage overhead; 2) The lack of interaction between dual-source features impairs precise spatial perception. In this paper, we propose an efficient heterogeneous spatial feature interaction method, termed Drone Referring Localization (DRL), which aims to localize UAV-view images within satellite imagery. Unlike conventional methods that treat different data sources in isolation, followed by cosine similarity computations, DRL facilitates the learnable interaction of heterogeneous features. To implement the proposed DRL, we design two transformer-based frameworks, Post-Fusion and Mix-Fusion, enabling end-to-end training and inference. Furthermore, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Greedy Policy Search
