RemoteDet-Mamba: A Hybrid Mamba-CNN Network for Multi-modal Object Detection in Remote Sensing Images
Kejun Ren, Xin Wu, Lianming Xu, Li Wang

TL;DR
RemoteDet-Mamba is a novel multi-modal remote sensing object detection network that enhances small object detection and inter-class discrimination through a patch-level fusion strategy, achieving high performance with low computational cost.
Contribution
The paper introduces RemoteDet-Mamba, a hybrid Mamba-CNN network with a patch-level four-direction selective scanning fusion strategy for improved remote sensing object detection.
Findings
Outperforms current mainstream methods on DroneVehicle dataset
Achieves high detection accuracy with low parameter count
Effectively decouples dense targets and reduces computational complexity
Abstract
Unmanned Aerial Vehicle (UAV) remote sensing, with its advantages of rapid information acquisition and low cost, has been widely applied in scenarios such as emergency response. However, due to the long imaging distance and complex imaging mechanisms, targets in remote sensing images often face challenges such as small object size, dense distribution, and low inter-class discriminability. To address these issues, this paper proposes a multi-modal remote sensing object detection network called RemoteDet-Mamba, which is based on a patch-level four-direction selective scanning fusion strategy. This method simultaneously learns unimodal local features and fuses cross-modal patch-level global semantic information, thereby enhancing the distinguishability of small objects and improving inter-class discrimination. Furthermore, the designed lightweight fusion mechanism effectively decouples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Remote-Sensing Image Classification · Automated Road and Building Extraction
