UAVD-Mamba: Deformable Token Fusion Vision Mamba for Multimodal UAV Detection

Wei Li; Jiaman Tang; Yang Li; Beihao Xia; Ligang Tan; Hongmao Qin

arXiv:2507.00849·cs.CV·July 2, 2025

UAVD-Mamba: Deformable Token Fusion Vision Mamba for Multimodal UAV Detection

Wei Li, Jiaman Tang, Yang Li, Beihao Xia, Ligang Tan, Hongmao Qin

PDF

Open Access

TL;DR

UAVD-Mamba introduces a multimodal UAV detection framework using deformable token fusion and multiscale feature extraction, significantly improving detection accuracy for small and occluded objects in UAV imagery.

Contribution

The paper proposes Deformable Token Mamba Blocks for adaptive feature extraction and multiscale detection, enhancing multimodal UAV object detection performance.

Findings

01

Outperforms baseline OAFA by 3.6% mAP on DroneVehicle dataset

02

Effectively detects small and occluded objects

03

Utilizes deformable convolutions for geometric adaptability

Abstract

Unmanned Aerial Vehicle (UAV) object detection has been widely used in traffic management, agriculture, emergency rescue, etc. However, it faces significant challenges, including occlusions, small object sizes, and irregular shapes. These challenges highlight the necessity for a robust and efficient multimodal UAV object detection method. Mamba has demonstrated considerable potential in multimodal image fusion. Leveraging this, we propose UAVD-Mamba, a multimodal UAV object detection framework based on Mamba architectures. To improve geometric adaptability, we propose the Deformable Token Mamba Block (DTMB) to generate deformable tokens by incorporating adaptive patches from deformable convolutions alongside normal patches from normal convolutions, which serve as the inputs to the Mamba Block. To optimize the multimodal feature complementarity, we design two separate DTMBs for the RGB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image Fusion Techniques · UAV Applications and Optimization