GeoMamba: A Geometry-driven MambaVision Framework and Dataset for Fine-grained Optical-SAR Object Retrieval
Tiantong Fang, Xiuwei Wang, Jing Xiao, Wujie Zhou, Liang Liao, Mi Wang

TL;DR
GeoMamba is a novel geometry-driven framework designed to improve fine-grained optical-SAR object retrieval by enhancing cross-modal feature interaction and preserving structural information, validated on a new dataset.
Contribution
The paper introduces GeoMamba, a new framework with geometric feature injection and consistency constraints, and provides a new dataset for unaligned cross-modal retrieval.
Findings
GeoMamba achieves 63.3% mAP on FGOS-as dataset.
GeoMamba outperforms existing methods in all-to-all retrieval.
The framework effectively preserves object structures during retrieval.
Abstract
Multi-source remote sensing enables complementary observation of ground objects, while cross-modal fine-grained object retrieval remains challenging, especially under unaligned optical and SAR conditions. Unlike conventional retrieval settings that rely on paired or spatially aligned samples, practical optical-SAR retrieval is affected by substantial modality discrepancy, speckle noise, and structural inconsistency, which limit robust cross-modal representation learning. To address this problem, we propose GeoMamba, a geometry-driven framework tailored for optical-SAR fine-grained retrieval. Specifically, GeoMamba introduces a Geometric Feature Injection (GFI) module that enhances cross-modal feature interaction and incorporates structural priors, thereby improving the robustness of SAR representations and promoting geometry-consistent feature learning. In addition, a Geometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
