SARVLM: A Vision Language Foundation Model for Semantic Understanding in SAR Imagery

Qiwei Ma; Xukun Lu; Wang Liu; Puhong Duan; Xudong Kang; Shutao Li

arXiv:2510.22665·cs.CV·May 18, 2026

SARVLM: A Vision Language Foundation Model for Semantic Understanding in SAR Imagery

Qiwei Ma, Xukun Lu, Wang Liu, Puhong Duan, Xudong Kang, Shutao Li

PDF

1 Repo

TL;DR

SARVLM is a pioneering vision-language foundation model specifically designed for semantic understanding in SAR imagery, utilizing a large-scale dataset and a novel domain transfer training strategy.

Contribution

The paper introduces SARVLM, the first SAR-specific vision-language model, and a large-scale SARVLM-1M dataset, with a two-stage domain transfer approach from natural images.

Findings

01

Outperforms state-of-the-art vision-language models on 13 benchmarks.

02

Demonstrates strong capabilities in image-text retrieval, object detection, and zero-shot classification.

03

Validates effectiveness through extensive experiments across diverse tasks.

Abstract

Synthetic Aperture Radar (SAR) is a critical imaging modality due to its all-weather operational capability. Although recent advances in self-supervised learning and masked image modeling (MIM) have enabled SAR foundation models, these approaches primarily focus on low-level visual features and often neglect multi-modal representation. Moreover, multimodal data for SAR is scarce, limiting the development of robust cross-modal models. To address this limitation, we construct SARVLM-1M, a large-scale vision-language dataset comprising over one million image-text pairs aggregated from existing datasets. Furthermore, to mitigate the substantial differences between SAR and natural imagery, we propose a two-stage domain transfer training strategy that leverages optical remote sensing data as an intermediate bridge, facilitating effective knowledge transfer from natural images to SAR domains.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KlayMa527/SARVLM.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced SAR Imaging Techniques · Multimodal Machine Learning Applications