SAM Guided Semantic and Motion Changed Region Mining for Remote Sensing Change Captioning

Futian Wang; Mengqi Wang; Xiao Wang; Haowen Wang; Jin Tang

arXiv:2511.21420·cs.CV·November 27, 2025

SAM Guided Semantic and Motion Changed Region Mining for Remote Sensing Change Captioning

Futian Wang, Mengqi Wang, Xiao Wang, Haowen Wang, Jin Tang

PDF

Open Access

TL;DR

This paper introduces a novel change captioning method for remote sensing images that combines the Segment Anything Model with a knowledge graph and Transformer-based fusion to improve region awareness and temporal alignment, achieving state-of-the-art results.

Contribution

It proposes integrating SAM for region-level change detection and a knowledge graph for enhanced semantic understanding into remote sensing change captioning.

Findings

01

Achieves state-of-the-art performance on benchmark datasets

02

Effectively delineates semantic and motion change regions

03

Enhances captioning accuracy with region-aware features

Abstract

Remote sensing change captioning is an emerging and popular research task that aims to describe, in natural language, the content of interest that has changed between two remote sensing images captured at different times. Existing methods typically employ CNNs/Transformers to extract visual representations from the given images or incorporate auxiliary tasks to enhance the final results, with weak region awareness and limited temporal alignment. To address these issues, this paper explores the use of the SAM (Segment Anything Model) foundation model to extract region-level representations and inject region-of-interest knowledge into the captioning framework. Specifically, we employ a CNN/Transformer model to extract global-level vision features, leverage the SAM foundation model to delineate semantic- and motion-level change regions, and utilize a specially constructed knowledge graph…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Geographic Information Systems Studies