SAR-TEXT: A Large-Scale SAR Image-Text Dataset Built with SAR-Narrator and A Progressive Learning Strategy for Downstream Tasks
Yiguo He, Xinjun Cheng, Junjie Zhu, Chunping Qiu, Jun Wang, Xichuan Zhang, Qiangjuan Huang, Ke Yang

TL;DR
This paper introduces SAR-TEXT, a large-scale SAR image-text dataset created with a novel multi-stage SAR-Narrator framework, and demonstrates its effectiveness across multiple vision-language tasks with new models and improved performance.
Contribution
The paper presents SAR-TEXT, the first large-scale high-quality SAR image-text dataset, and proposes the SAR-Narrator framework for generating descriptions, enabling advanced vision-language modeling in SAR imagery.
Findings
SAR-RS-CLIP improves retrieval recall by over 12%.
SAR-RS-CoCa significantly enhances captioning metrics.
SAR-GPT achieves superior VQA performance.
Abstract
Vision Language Models (VLMs) have achieved remarkable breakthroughs in the field of remote sensing in recent years. Synthetic Aperture Radar (SAR) imagery, with its all-weather capability, is essential in remote sensing, yet the lack of large-scale, high-quality SAR image-text datasets hinders its semantic understanding. In this paper, we construct SAR-TEXT, a large-scale and high-quality dataset consisting of over 130,000 SAR image-text pairs. To construct the SAR-TEXT dataset, we design the SAR-Narrator framework, which generates textual descriptions for SAR images through a multi-stage strategy. To verify the effectiveness of the SAR-TEXT dataset, we conduct experiments on three typical vision-language tasks: image-text retrieval, image captioning, and visual question answering (VQA). Specifically, we construct three representative models on SAR-TEXT: SAR-RS-CLIP, SAR-RS-CoCa, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
