GeoMamba: A Geometry-driven MambaVision Framework and Dataset for Fine-grained Optical-SAR Object Retrieval

Tiantong Fang; Xiuwei Wang; Jing Xiao; Wujie Zhou; Liang Liao; Mi Wang

arXiv:2605.19734·cs.CV·May 20, 2026

GeoMamba: A Geometry-driven MambaVision Framework and Dataset for Fine-grained Optical-SAR Object Retrieval

Tiantong Fang, Xiuwei Wang, Jing Xiao, Wujie Zhou, Liang Liao, Mi Wang

PDF

TL;DR

GeoMamba is a novel geometry-driven framework designed to improve fine-grained optical-SAR object retrieval by enhancing cross-modal feature interaction and preserving structural information, validated on a new dataset.

Contribution

The paper introduces GeoMamba, a new framework with geometric feature injection and consistency constraints, and provides a new dataset for unaligned cross-modal retrieval.

Findings

01

GeoMamba achieves 63.3% mAP on FGOS-as dataset.

02

GeoMamba outperforms existing methods in all-to-all retrieval.

03

The framework effectively preserves object structures during retrieval.

Abstract

Multi-source remote sensing enables complementary observation of ground objects, while cross-modal fine-grained object retrieval remains challenging, especially under unaligned optical and SAR conditions. Unlike conventional retrieval settings that rely on paired or spatially aligned samples, practical optical-SAR retrieval is affected by substantial modality discrepancy, speckle noise, and structural inconsistency, which limit robust cross-modal representation learning. To address this problem, we propose GeoMamba, a geometry-driven framework tailored for optical-SAR fine-grained retrieval. Specifically, GeoMamba introduces a Geometric Feature Injection (GFI) module that enhances cross-modal feature interaction and incorporates structural priors, thereby improving the robustness of SAR representations and promoting geometry-consistent feature learning. In addition, a Geometric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.