UAVD-Mamba: Deformable Token Fusion Vision Mamba for Multimodal UAV Detection
Wei Li, Jiaman Tang, Yang Li, Beihao Xia, Ligang Tan, Hongmao Qin

TL;DR
UAVD-Mamba introduces a multimodal UAV detection framework using deformable token fusion and multiscale feature extraction, significantly improving detection accuracy for small and occluded objects in UAV imagery.
Contribution
The paper proposes Deformable Token Mamba Blocks for adaptive feature extraction and multiscale detection, enhancing multimodal UAV object detection performance.
Findings
Outperforms baseline OAFA by 3.6% mAP on DroneVehicle dataset
Effectively detects small and occluded objects
Utilizes deformable convolutions for geometric adaptability
Abstract
Unmanned Aerial Vehicle (UAV) object detection has been widely used in traffic management, agriculture, emergency rescue, etc. However, it faces significant challenges, including occlusions, small object sizes, and irregular shapes. These challenges highlight the necessity for a robust and efficient multimodal UAV object detection method. Mamba has demonstrated considerable potential in multimodal image fusion. Leveraging this, we propose UAVD-Mamba, a multimodal UAV object detection framework based on Mamba architectures. To improve geometric adaptability, we propose the Deformable Token Mamba Block (DTMB) to generate deformable tokens by incorporating adaptive patches from deformable convolutions alongside normal patches from normal convolutions, which serve as the inputs to the Mamba Block. To optimize the multimodal feature complementarity, we design two separate DTMBs for the RGB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image Fusion Techniques · UAV Applications and Optimization
