Change Captioning in Remote Sensing: Evolution to SAT-Cap -- A   Single-Stage Transformer Approach

Yuduo Wang; Weikang Yu; Pedram Ghamisi

arXiv:2501.08114·cs.CV·January 15, 2025

Change Captioning in Remote Sensing: Evolution to SAT-Cap -- A Single-Stage Transformer Approach

Yuduo Wang, Weikang Yu, Pedram Ghamisi

PDF

TL;DR

This paper introduces SAT-Cap, a single-stage transformer model for remote sensing change captioning that reduces complexity and improves semantic detail extraction, outperforming existing methods on key datasets.

Contribution

The paper presents SAT-Cap, a novel single-stage transformer approach with spatial-channel attention and cosine similarity fusion for more efficient and detailed change captioning in remote sensing images.

Findings

01

Achieves CIDEr scores of 140.23% on LEVIR-CC dataset.

02

Achieves CIDEr scores of 97.74% on DUBAI-CC dataset.

03

Outperforms current state-of-the-art methods.

Abstract

Change captioning has become essential for accurately describing changes in multi-temporal remote sensing data, providing an intuitive way to monitor Earth's dynamics through natural language. However, existing change captioning methods face two key challenges: high computational demands due to multistage fusion strategy, and insufficient detail in object descriptions due to limited semantic extraction from individual images. To solve these challenges, we propose SAT-Cap based on the transformers model with a single-stage feature fusion for remote sensing change captioning. In particular, SAT-Cap integrates a Spatial-Channel Attention Encoder, a Difference-Guided Fusion module, and a Caption Decoder. Compared to typical models that require multi-stage fusion in transformer encoder and fusion module, SAT-Cap uses only a simple cosine similarity-based fusion module for information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need