SAM Guided Semantic and Motion Changed Region Mining for Remote Sensing Change Captioning
Futian Wang, Mengqi Wang, Xiao Wang, Haowen Wang, Jin Tang

TL;DR
This paper introduces a novel change captioning method for remote sensing images that combines the Segment Anything Model with a knowledge graph and Transformer-based fusion to improve region awareness and temporal alignment, achieving state-of-the-art results.
Contribution
It proposes integrating SAM for region-level change detection and a knowledge graph for enhanced semantic understanding into remote sensing change captioning.
Findings
Achieves state-of-the-art performance on benchmark datasets
Effectively delineates semantic and motion change regions
Enhances captioning accuracy with region-aware features
Abstract
Remote sensing change captioning is an emerging and popular research task that aims to describe, in natural language, the content of interest that has changed between two remote sensing images captured at different times. Existing methods typically employ CNNs/Transformers to extract visual representations from the given images or incorporate auxiliary tasks to enhance the final results, with weak region awareness and limited temporal alignment. To address these issues, this paper explores the use of the SAM (Segment Anything Model) foundation model to extract region-level representations and inject region-of-interest knowledge into the captioning framework. Specifically, we employ a CNN/Transformer model to extract global-level vision features, leverage the SAM foundation model to delineate semantic- and motion-level change regions, and utilize a specially constructed knowledge graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Geographic Information Systems Studies
