PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching

Han Nie; Bin Luo; Jun Liu; Zhitao Fu; Huan Zhou; Shuo Zhang; Weixing Liu

arXiv:2502.18104·cs.CV·September 22, 2025

PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching

Han Nie, Bin Luo, Jun Liu, Zhitao Fu, Huan Zhou, Shuo Zhang, Weixing Liu

PDF

Open Access 1 Repo

TL;DR

PromptMID introduces a novel method for optical-SAR image matching that leverages diffusion and vision foundation models to create modality-invariant descriptors, significantly improving cross-domain generalization without extensive retraining.

Contribution

It proposes PromptMID, a new approach using text prompts and foundation models to generate modality-invariant features for optical-SAR image matching, enhancing generalization across unseen domains.

Findings

01

Outperforms state-of-the-art methods in diverse regions

02

Achieves superior results in both seen and unseen domains

03

Demonstrates strong cross-domain generalization capabilities

Abstract

The ideal goal of image matching is to achieve stable and efficient performance in unseen domains. However, many existing learning-based optical-SAR image matching methods, despite their effectiveness in specific scenarios, exhibit limited generalization and struggle to adapt to practical applications. Repeatedly training or fine-tuning matching models to address domain differences is not only not elegant enough but also introduces additional computational overhead and data production costs. In recent years, general foundation models have shown great potential for enhancing generalization. However, the disparity in visual domains between natural and remote sensing images poses challenges for their direct application. Therefore, effectively leveraging foundation models to improve the generalization of optical-SAR image matching remains challenge. To address the above challenges, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hanniewhu/promptmid
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques

MethodsDiffusion